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Preface 



Evaluation and assessment in science and mathematics education is 
the topic of two Unesco titles: this resource document, and the 
forthcoming Volume III of 'Innovations in Science and Technology 
Education, 

In August 1988, The Sixth International Congress on 
Mathematical Education (ICME 6) brought together in Budapest, 
Hungary, some 2,500 mathematics educators from seventy-two 
countries. One of the Theme Grows was devoted to evaluation and 
assessment in mathematics education. From the forty-seven papers 
presented in the thirteen sessions of this theme group, a selection 
was made to constitute this resource document. 

This document complements 'Mathematics for AIT, which also 
appeared in Unesco's Science and Technology Education Document 
Series (Volume 20), As Thomas Romberg reminds us in his survey 
paper, the goals of a 'mathematics for all policy are different 
from what happens in mathematics classrooms, and evaluation and 
assessment quantify this difference. Part 2 of the document 
centres around the Second International Mathematics Study, which 
examined the mathematics curriculum from, three points of view: 
the intended, the implemented, and the attained. National 
initiatives in evaluation and selected topics make up the final 
parts of the document. 

Unesco wishes to express its appreciation to the editor, 
David Robitaille, to the twenty-two authors of papers, to the 
University of British Columbia for preparing the manuscript, and 
to ICME 6, 

The views expressed in this report are those of the authors 
and not necessarily those of Unesco, 

We welcome comments on the contents of this document, 
which should be sent to: Division of Science, Technical and 
Environmental Education, Unesco, Place de Fontenoy, 75700 Paris, 
France. 



introduction 



In recent years, an increasing amount of international attention in the 
mathematics education community has been focussed on evaluation and assess- 
ment. The organizers of ICME'6 acknowledged the level of interest in this topic 
by including a Theme Group on evaluation and assessment in the conference 
program to provide a forum for international discussion of evaluation activities in 
mathematics education. The members of the panel responsible for arranging the 
work of the Theme Group included Antoine Bodin (France), Raimondo Bolletta 
(Italy), Desmond Broomes (Barbados), David Robitaille (Canada; Chief Organ- 
izer), Toshio Sawada (Japan), and Julia Szendrei (Hungary). 

Over a four-day period, 13 sessions were scheduled for Theme Group T4, 
and 47 papers were accepted for presentation by scholars from 13 countries. 
Because of limitations of space, it has not been possible to publish all of the papers 
accepted for presentation; in fact, only 15 are included in this collection. Brief 
summaries of all the papers presented at ICME - 6 may be found in the official 
proceedings of the conference. 

The papers included here have been divided into four groups. The first 
group consists of one paper, and that is the Survey Paper prepared for the 
conference by Tom Romberg. The second section consists of papers dealing with 
findings from the Second lEA Mathematics Study whxh was conducted in some 
20 countries in the early 1980s. The next group of papers focusses on national 
initiatives in evaluation in mathemadcs,and includes several papers on this topic 
from tlie United Kingdom. The final set of papers deal with a variety of topics in- 
cluding evaluadon of students' problem-solving acrivities, diagnostic assessment, 
and evaluation of students* understanding of selected concepts. 

Preparing a set of papers for publication requires a considerable amount of 
work in the best of circumstances. When the papers are submitted by authors from 
13 countries, in a variey of formats, and in varying degrees of readiness for 
publicarion, the task can assume very large proportions indeed. I have been 
fortunate to have highly skilled and very dedicated assistance in this task, and I am 
very grateful to all those who helped in any way. First of all, 1 would like to thank 
Lori Teichman, a student in the teacher-education program at the University of 
British Columbia, who transfen-ed all of the papers into Microsoft Word ™ for the 
Macintosh microcomputer. My thanks also go to Brian Kilpatrick, a technician in 
the Department of Mathematics and Science Education at U.B.C., who provided 
technical support and advice and Michael Howell-Jones, of Education Audio- 
Visual Services of the Faculty of Education at U.B.C, who produced the camera- 
ready version of the documetit using PagerTwker ™ . A special word of thanks goes 
to my colleague, James Shemll, for proof-reading the entire set of papers so 
conscienriously. 

Iwouldalsolikccothank Unesco, and parti cul a rly Ed J acobscn, for agreeing 
to publish this set of papers. Finally, 1 would like to thank Nancy Sheehan, Dean 
oftheFacultyofEducationat the University of Brirish Columbia, for providing the 
resources needed to have the papers prepared for publication. 



David F. Robitaille 



Vancouver 
January, 1989 
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EVALUATION: A COAT OF MANY COLORS 



Thomas A. Romberg 



"EVALUATE: to judge or determine the worth or 
quaky of Webster's New World DictiomTy , 1985, p. 484 

Evaluation in education has evolved from an 
initial and single concentration on measurement of 
achievement in order to make judgments about stu- 
dents, to the current and growing interest in providing 
information to support policy and program decision 
making. To make those latter judgments, information 
from students about their mathematical achievement is 
usually used. Thus, in this paper both the methods of 
gatheringinformation from studentsandtheusc of that 
information to make a variety of judgments are exam- 
ined. 

The assess men tofstudent performance in schools 
has a long history. However, contemporary models for 
both the gathering of performance data and the use of 
the information for policy and program decision mak- 
inghave only evolved during thepast quarter-century. 
The purposes of this survey paper are: 

1 . to relate the gathering of assessment data to 
educational decision making; 

2. to trace the history of this evolution. The assess- 
ment hiSiTory begins in the 19th century and the 
evaluation history in the 1930s. However, in both 
cases, the developments in the past decade arc 
stressed; 

3. to illustrate the strengths and weaknesses of two 
contemporary social policy evaluation models. 
These are: evaluations of the impact of new 
mathematics programs, and large-scale profile 
evaluations; and 

4. to describe four recent trends in evaluation and 
assessment. 

Although the history and trends in assessment 
and evaluation are not unique to school matliematics, 
the emphasis and examples ''n this paper are all on 
assessing mathematical performance and the use of that 
information in instaictional and policy contexts. Also, 
the examples have been selected to reflect the variety 
of models, methods, and procedures used through jut 
the world. 



The principal point which should be under- 
stood is that at present there is considerable dispanty 
between theory and practice. Academic considera- 
tions about goals, decisions, methods of gathering in- 
formation , and validity of that information are in sharp 
contrast to the political and practical expectations of 
many governments and administrators. What is pos- 
sible differs from what is done. 

Educational Decision Making 

The following examples are given to illustrate 
the relationships between measures of achievement 
and the variety of si tuations i n wh ich that i nformation 
is used to make judgment (hence, the title of this 
paper): 

1 . A student has decided to study biology and would 
like to know whether she has rhe prerequisite 
knowledge to enroll in a biometrics course. 

2. The admissions committee of a tertiary institution 
must select 100 students from some 800 who have 
applied for an engineering program. 

3. A teacher would like to grade students on how 
well they understood the chapter on simultaneous 
linear equations just completed. 

4. An official in a Department of Education has been 
asked to provide a legislative committee with infor- 
mation about pupil performance in mathematics. 

5. A publishing company is interested in developing 
text to teach specific concepts of statistics to 
students in middle school. It needs feedback from 
teachers about the adequacy of the materials (i.e., 

what things were successful and what things were 
not) so that improvements could be made. 
6 A researcher interested in early cognitive develop- 
ment with respect to mathematics would like to 
assess preschool children's ability to handle certain 
mathematical relationships, such as comparison of 
two sets with respect to numerosity. 

7. An employer is interested in the mathematical 
capability of job applicants. 

8. An official must decide which students are to be 
admitted to academic high schools and which to 
technical schools. 
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These examples are only a few of the typical 
judgment situations in which information from stu- 
dents about their mathematical performance are fre- 
quently faced. In addition, they reflect the diversity of 
judgment (qualification, selection, placement, diagno- 
sis, grading, profiling, researching, and so forth) as well 
as the variety of personnel involved in those decisions 
(students, administrators, teachers, developers, em- 
ployers, and researchers). 

Based on these examples, I have assumed that 
information from students about their mathematical 
achievemen t is i mportant; and that information should 
influence educational decisions. The scenarios cited 
here are but a few examo^es of the many decisions 
facing educators throughout the world. Whether 
achievement data as a source of information actually 
influences schooling decisions isa separate and distinct 
empirical question. Nevertheless, valid data about 
student achievement should be available and used 
when iraking many such decisions. 

Also, we must ask: How should such informa- 
tion beelicited? The answer to thisquestion isbasedon 
a second assumption. The methods of gathering infor- 
mation (how data is collected, from whom, and how it 
is aggregated, organized, and reported) depends on the 
decisions to be made. 

From these assumptions and theexamplesgiven 
above, I believe three elements of the decision- making 
process should be considered. 

1. The decisions must be specifically identified. 
Gathering information without an explicit 
purpose in mind wastes time and resources. 
Although it is now fashionable to create data bases 
under the assumption that having such data will be 
useful, it has been shown that such data bases are 
rarely used or of value unless the purposes for 
which the data arc to be used have been considered 
first. 

2. The implications of the judgments to be made (or 
the questions to be answered) must be examined. 
This involves considering error in measurement, 
the errors in judgment (both Type I and Type II) 
that one is v/illing to tolerate, and whether the 
decisions are irrevocable. Teachers may be willing 
to accept considerable measurement error when 
administering chapter tests because they can rely 
on other information to judge a student's progress; 
a developer may be willing to live with high 
judgment errors in the development of a new 
instructional unit; while an admissions committee 
should seek minimal measurement error in choos- 
ing which applicants to accept to a program. 



3. The "unit" about which the decisions are to be 
made must be determined (individuals, groups, 
classes, schools, materials, research questions, etc.). 
It has long been common practice to test all 
students on every item in every test; data from 
individuals can then be aggregated at any group 
level for any purpose. This practice is extremely 
wasteful, both in terms of cost and time. For 
example, the administration of a standardized test 
merely to publish the results in the local press (as 
is common in the United States of America ) is 
wasteful both of student time and the cost of 
administration and scoring. Profiling school 
performance can be accomplished mo^e efficiendy. 

In summary, to assess student performance in 
mathematics, one should consider the kinds of judg- 
ments that need to t<i made and tailor the assessment 
procedures in light of considerations about those deci- 
sions. This is particularly important because the infor- 
mation is being used by policy makers to make pro- 
grammatic decisions. 

History of Assessment and Evaluation 

The history of the measurement of human 
behavior, with primary reference to the capacities and 
educational attainments of school students, may be 
divided roughly into four periods. During the first 
period, from the beginning of historical records to 
about the 19th century, measuremen t in education was 
quite crude. During the second period, embracing ap- 
proximately the I9di century, educational measure- 
ment began to assimilatefromvarioussources the ideas 
and the scientific and statistical techniques which 
were later to result in the psychometric testing move- 
ment. The third period, dating from about 1900 to the 
1960's, can be characterized as the psychometric pe- 
riod. The final period, dating from the I960's to the 
present, is the policy-program evaluation period. 

Early Examinations 

The initiation ceremonies by which prirr ive 
tribes tested the knowledge of tribal customs, endur- 
ance, and bravery of young men prior to admission to 
the ranks of adult males may be among earliest exami- 
nations employed by human beings. Use of a crude oral 
test was reported in the Old Testament, and Socrates 
IS known to have employed searching types of oral 
quizzing. ElaboiBte and exhaustive written examina- 
tions were used by the Chi nesc as early as 2 200 B.C. in 
the selection of their public officials. These illustra- 
tions may be classified as historical antecedents of 
performance tests, oral examinations, and essay tests. 
However, there is no evidence thatdiffcrent individu- 
als ever took the same tests, and all judgments were 
made by officials in a manner similar to examinations 
given to doctoral students today. 
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Educational Testing in the 19th Century 

Three persons made outstanding contributions 
to 59rh-century developments. The ideas of diese 
men — Horace Mann, George Fisher, and J. M. Rice — 
appear to be forerunners of developments during the 
present century. 

The first schooi examinations of note appear to 
he those instituted in the Boston schools of 1845 in the 
United States as substitutes foi oral tests when en- 
rollments became so large that the school committee 
could no longer examine all pupils orally. These writ- 
• ten examinations, in arithmetic, astronomy, geogra- 
phy, grammar, history, and natural philosophy, im- 
pressed Horace Mann, then secretary of the Massachu- 
setts Board of Education. As editor of the Common 
School Journal J he published extracts fnjim them and 
concluded that the new written examination was supe- 
rior to the old oral test in these respects: 

1. It is impartial 

2. It is just to the pupils. 

3. It is more thorough than older forms of 
examination. 

4. It prevents^ the "officious interference" of the 
teacher. 

5. It "determines, beyond appeal or gainsaying, 
whether the pupils have been faithfully and 
competenriy taught." 

6. It takes away "all possibility of favoritism." 

7. It makes the informaiion obtained available to all. 

8. It enables all to appraise the ease or difficult^/ of 
the questions. 

(Greene, Jorgenson, & Gerberich, 1953) 

Although these ideas are those represented by 
modem tests, the instruments themselves were inade- 
quate. However, in successive issues of the Common 
Schooljoumal. Mann suggested most of the elements in 
exami-'iations that are found in contemporary measure- 
ment (e.g., timed responses by students to identical 
questions). 

To Reverend George Fisher, an English school- 
master, goes the credit for devising and using what were 
probably the first objective measures of achievement. 
His "scale books," used in the Greenwich Hospital 
School as early as 1 864^ provided means for evaluating 
accomplishments in handwriting, spelling, mathemat- 
' ics, grammar and composition, and several other school 
subjects. Specimens of pupil work were compared with 
"standard specimens" to determine nunierical ratings 
that, at least for spelling and a few other subjects, 
depended on errors in performance (Greene, jorgen- 
son, &Gerterich, 1953). Scoring procedures for many 
examinations still follow this procedure (e.g the English 
"O" 'level exams). 

Q The use of test information for program evalu- 

ERLC i:j 



ation was first developed by J. M. Rice, an American 
denrist. In 1894, he developed a battery spelling test. 
Having administered a list of spelling words to pupils in 
many school systems and analyzed die results, Rice 
found that pupils who had studied spelling 30 minutes 
a day for eight years were not better spellers than 
children who had studied the subject 15 minutes a day 
for eight years. Rice was attacked and reviled for diis 
'lieresy," and some educators even attacked the use of 
a measure of how well pupils could spell for evaluating 
the efficiency of spelling instruction. They intended 
that spelling was taught to develop the pupils* minds 
and not to teach them to spell. It was more than ten 
years later that Rice*s pioneering resulted in significant 
attention to objecrive models in educational testing 
(Ayres, 1918). 

The Psychometric Period 

This era began shortly after the turn of the 
century. Although thehistorical antecedents sketched 
in the preceding paragraphs were essential prerequi- 
sites, developments first in mental testing and shortly 
after in achievement testing are at the roots of this era. 

General Intelligence Tests. Attempts to measure gen- 
eral intelligence, or ability to learn or ability to adapt 
oneself to new situations, had been made both in the 
United States of America and in France. The first in- 
dividual test was developed in France, and the first 
group test was developed some years later in the United 
States of America. 

Individual intelligence scales were originated in 
1905 by Binetand Simon inFrance. Theirfirst scale was 
devised primarily for the purpose of selecting mentally 
retarded pupils who required special instrucrion. This 
pioneerindividual-intelligence scale was based on inter- 
preting the relative intelligence of different children at 
any given chronological age by the number of questions 
of varied types and increasing k 'els of difficulty they 
could answer. These characterisdcs were all re-embod- 
ied in the 1908 and 191 1 revisions of the Binet-Simon 
Scale and remain basic to most individual intelligence 
scales today. The 1908 revision introduced the funda- 
mentally important concept of mental age (MA) and 
provided means for obtaining it (Freeman, 1930). 

The first group intelligence test was Army Al- 
pha, used for the measurement and placement of 
American army recruits and draftees during World War 
I. It was the product of the collaboration of various 
psychologists working on group intelligence tests when 
the United States entered die war. 

Aptitude Tests. The measurement of aptitudes, or 
those potentialities for success in an area of perform- 
ance that exist prior to direct acquaintance with that 
area, was closely related to intelligence tesring. Early 
attempts to measure general intelligence tested many 
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specific traits and aptitudes, but tliat approach was 
abandoned after Binet showed that tests of more com- 
plex forms of behavior were superior, Itwassoonappar- 
ent, however, that general intelligence tests were not 
highly predictive of certain types performance, espe- 
cially in the .rades and industries. Munsterbcrg's apti- 
tude tests for telephone girls and streetcar motormen 
were followed by tests of mechanical aptitude, musical 
aptitude, an aptitude, clerical aptitude, and aptitude 
for various subjects of the high school and college 
u.-cula prior to 1930 (Watson, 1938). Spearman's 
(1904) splitting of total mental ability into a general 
factor and many specific factors had its influence on 
this movemePc. 

Achievement Tests. Modem achievement testing was 
stimulated by Thorndike's (1904) book on mental, so- 
cial, and educational measurements. Through his book 
and his inf^* *nce on his students, Thomdike was pre- 
dominant!, responsible for the early development of 
standardized tests. Stone, a student of Thomdike's, 
published the first arithmetic reasoning test in 1908. 
Between 1 V09 and 1 9 1 5 , a series of ari th metic tests and 
scales for measuring rVilities in English composition, 
spelling, drawing, and handwriting were published 
(Odell, 1930). Literally thousands of standardized 
achievement tests have been published during the last 
half-century. 

The reasons for presenting this brief history of 
testing are r.hreefold. First, what is referred to as the 
modem testing movement began with a selection prob- 
lem (Binet & Si mon) and a placement problem (Army 
Alpha). It was assumed that a single measure (e.g. MA) 
or index (e.g., IQ) could be developed to compare 
individuals on what was assumed to be a general, fixed, 
unidimensional trait. In turn, the proceciurcs that 
evolved in developing and administering these tests 
were used in aptitude and achievement tests. Second, 
the testing procedures now considered typical in many 
countries were developed for group administration of 
early intelligence tests. Such tests arecompriscdofasct 
of questions (items), each having one unambiguous 
answer, h; this sense, such tests are "objective" since 
subjective infercncesare not necessary. All subjectsare 
adrpinistered the same items under standard (nearly 
identical) situations with the same instructions, time, 
constraints, etc. Furthermore, subjects' answers can be 
easily scored as correct or not, the total number of 
correct answers tallied, tallies transformed, and trans- 
formed scores compared. Psychometrics involving the 
applicationof statistical procedures *-o such tests devel- 
oped as a field of study in the 1920s. 

Most importantly, it should be understood that 
the test! ng movement was a prod* -^t of an historical era. 
It grew out of the machine-a^^e thinking of the indus- 
trial revolution of the past century. Business, industry, 
and, in particular, schools have been conceived, modi- 
fied, and operated based on this mechanical view of die 



world since before the turn of the century. 

The Policy-Program Evaluation Period 

Information about student achievement has 
long been used by teachers and educators to make de- 
cisions about students. However, the use of that 
information for wide-scale policy or program judg- 
ments is recent. It began with the burst of reform 
policies associated with the mid-60s Great Society 
initiatives in the United States. F'xleral-level insis- 
tence onevaluation of those initiatives svas thrust upon 
a largely unprepared field. Little expcrdsc existed in 
theagenciesresponsiblefor carrying OiiCcvaluations in 
areas as diverse as bilingual education, career educa- 
tion, compensatory programs, readinjj, or mathemat- 
ics. In fact, in the United Sti.tcs ihc initial training 
institute on program evaluation was held at the Uni- 
versity of Illinois in 1963 (Directed by LeeCronbach). 

That early work followed the notions of Ralph 
Tyler (1931 ), the "father of educational evaluation." 
His conception of evaluation involved comparison 
between intended and observed program objectives. 
Tyler's model of evaluatioi^ in education dominated 
until the 1970s when that approach, like traditional 
social science models, were found inadequate as guid- 
ance for policy and practice. That evaluaf" jn model 
was based on the hypothetico-deductive ixaditions of 
"hard science." It focused on outcomes, and sought 
"significaiit differences." Initial evaluations of federal 
education programs used experimental methodology 
to assess student achievement and program accom- 
plishments. Asapplied, thisapproach paid little atten- 
tion ID the context of program activities or the proc- 
esses by which program plans were translated into 
practice (Eash, 1985; O'Keefe, 1984). Talk about 
evaluation ir'^luded fairly rigid rules for "good" design 
and "scientific" evaluation. In particular, they gath- 
ered data on student performance using standard 
achievement tests. 

In summary, evaluation for policy and program 
purposes began in the 1960s by attempting to apply 
"scien»-«f ic" principles using notions from experimental 
sciences. The information from students wis from tests 
based on the psychometric assessment technique out- 
lined above. Again, this approach to evaluation is a 
product of "industrial age" thinking. 

Two Social Policy Evaluation Models 

Policy makers (legislators, govemment officials, 
school administrators, ...) must make many decisions 
related to the teaching and kamingof mathematics. 
In this section, two evaluation models often used ' y 
policy makers are examined in detail so that their 
strengths and weaknesses are apparent. 
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Program Evaluation 

Attempts to evaluate the impact of ne»^ curricu- 
lum programs involved the comparison of the perform- 
anceof a group of students who had studied mathemat- 
ics trom that curriculum with an alternate group (most 
often a non-equivalent group) . Performance was meas- 
ured from both groups based on scores derived from the 
same instrument. Inirially, in the United States, stan- 
dardized tests were used; later it became common to use 
criterion- referencec tests. 

Norm-referenced standardized tests have be- 
come an annual ritual in most American schools. Such 
tests are designed to indicate a respondent's posirion in 
a population. Each test is comprised of a set of inde- 
pendent, multiple-choice quesrions. The items have 
necessarily been subjected to a preliminary trial with a 
representarive pupil group so that it is possible to 
arrange them in the desired manner with respect to 
difficulty and the degree to which they discriminate 
among students. Also, the test is accompanied by a 
chart or table to be used to transform test results into 
meaningful characterizarions of pupil mental ability or 
achievement (grade-equivalent scores, percentiles, 
stanines, etc.) 

Three features of such tests merit comment. 
First, although each test is designed to order individu- 
als on a single (unidimensional) trait, such as qualita- 
tive aptitude, the derived score is not a direct measure 
of that trait. Second, because individual scores are 
compared with those of a norm population, there wiM 
always be some high and some low scores. This is true 
even if the range of scores is small. Thus, high and low 
scores cannot fairly or accurately be j udged as "good" or 
"bad" with respect to the underlying trait. Third, test 
items are assumed to be equivalent to one another. 
They are selected on the basis of general level of 
difficulty (p-value) and some index of discrimination 
(e.g., non- spurious biscrial correlation). Furthermore, 
no clai m is made that the i tems are represcntati ve of any 
well-defined domain. 

The primary strengths of standardized tests are 
that they are relarively easy to develop, inexpensive, 
and convenientto administer. Furthermore, the results 
are comprehensible since standard procedures are 
followed. Their primary weakness is that they are often 
uscdfoi decisions they were not designed to address. For 
example, aggregaring standardized scores for students in 
a class (school, district, etc.) to producea class profile of 
achievement (class mean) is very inefficient. The tests 
provide too litrie informarion in light of the high cost 
involved. In fact it has become clear that such tests are 
of litde value for most evaluarions since the items are 
not selected as representative of the mathematical 
domains in the curriculum, 

Unfortunately, in the United States their use 



appears to be more strongly related to political, rather 
than educational, uses. For example, it is claimed that 
elected officialsand educational administrators increas- 
ingly use the scores from such tests in comparative 
ways to indicate which schools, school districts, and 
even individual teachers give the appearance of ach iev- 
ing better results (National Coalition of Advocates for 
Students, 1985). Such comparisons are simply mislead- 
ing. One can only conclude that standardized tests are 
unwisely overused. 

Criterion-referenced tests are a product of the 
behavioral objectives movement in the 1960's. They 
were developed to provide teachers with an objective 
set of procedures with which to make instructional 
decisions. Item development was based on the identi- 
fication of a set of such behavioral objectives as: "the 
subject, when exposed to the conditions described in 
the antecedent, displays the action specified in the verb 
in the situation specified by the consequent to some 
specified criterion" (Romberg, 1976, p. 23). Items 
randomly selected from a pool designed to represent the 
antecedent conditions and the same action verb are 
given to students. From their responses, diagnosis of 
problems or judgments of mastery of objectives can be 
made. 

Three features of these tests should be men- 
tioned. First, they usually are designed as part of a cur- 
riculum to be administered to individuals at the end of 
some instructional topic. Often, they are given indi- 
vidually, and teachers' judgments are made quickly. 
Second, they have occasionally been used in group 
settings. Forexample, the comprehensive ^^nievement 
monitoring scheme (Gorth,Schriber,&0'Reilly, 1974) 
periodically asscsscses student performance on a set of 
objectives. Third, decisions about performance are 
made with respect to somca priori criteria. 

Thestrengths of objective-referenced tests lie in 
their usefulness in instruction. As long as instruction 
on some topic focuses on the acquisition of some spe- 
cific concept or skill, such tests can be used to indicate 
whether or not the concept has been learned or the skill 
mastered. Furthermore, such te^«ts are scored easily and 
are readily interpretable. 

Four weaknesses need to be discussed. First, the 
specification of a set of behavioral objectives fraction- 
ates mathematical knowledge. In no way is it possible 
to reflect the interrelatedness of concepts and proce- 
dures in any domain. Second, objective-referenced 
tests are cosriy to construct because hundreds of objec- 
tives are included in any instructional program. T^ird, 
simple aggregation across objectives is not reasonable 
since objectives are interdependent. Fourth, and most 
importandy, items forhigherlevel or complex problem- 
solving processes are very difficult to construct and are 
usually omitted, In fact, as used, these tests reinforce the 
factory metaphor of schooling. They clearly do not 
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reflect how students reason about probLm situations, 
interpret results, or build arguments. 

The problem faced by most program evaluators 
in the 1960's was a residue of the "scientific*' traditions. 
The only evidence deemed of value was student per- 
formance at the end of treatment when compared with 
that of an alternate treatment group, and the evidence 
was gathered from either a standardized test or later a 
criterion referenced test. The results of such examina- 
tions (e.g. The National Longitudinal Study of Mathe- 
matical Abilities, Begle & Wilson, 1970) did not show 
that the new program was uniformly superior to the old 
program, but rather that different curricula are associ- 
ated with different patterns of achievement. 

Policy Profiles 

Profile tests are intended to provide informa- 
tion on a variety of mathematical topics so that policy 
makers can compare individuals or groups in terms of 
those topics. Profile tests have become very popular, 
lliey have been developed for several major studies of 
mathematical performance, including the National 
Assessment of Educational Progress (NAEP) in the 



United States, the First International Mathematics 
Study (FIMS), the Second International Mathematics 
Study (SIMS), and the Assessment of Performance 
Unit (APU) in England. 

Five features of profile assessments distinguish 
them from prior tests. First, they make no assumption 
of an underlying single trait; rather, the tests are de- 
signed to reflect the mu 1 tidi mensional na tu re of ma the- 
matical content. Second, items similar to those used in 
standardized or criterion-referenced tests are used. 
However, it must bcacknowledged that the mathemat- 
ics profiles developed by the APU in England (Foxman 
et. al., 1980, 1981) differ from most other profile 
assessments in the choice and form of itemsorexercises 
administered. Their exercises include a variety of 
open-ended questions, performance tasks, etc. Third, 
the unit of investigation is a group rather tlian an 
individual. Matrix sampling is usually used so that a 
wider variety of items can be included. Fourth, com- 
parisons between groups are shown graphically on 
actual scores so that no transformations are needed. 
(See, for example. Figures 1 and 2.) 
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Finally> validity is determined in terms of con- 
tent and/or cumcular validity. Mathematicians and 
teachers are asked to judge whether individual items 
reflect a content by behavior cell in a matrix. In fact, 
the usual approach in profile testing is to specify a 
content by behavior matrix. For example, to establish 
a framework for an item domain, a content by behavior 
grid was developed for each target population in SIMS 
(Weinzweig & Wilson, 1977). The content dimen- 
sions for both Grade 8 and Grade 1 1 populations were 
intended to cover all topics likely to be taught in any 
country. For Grade 8, the content outline contained 
133 categories under five broad *:lassifications: arith- 
metic, algebra, geometry, statistics, and measurement. 
For Grade 12, the content description was broader, 
containing 150 categories under seven headings: sets 
and relations, number systems, algebra, geometry, ele- 
mentary functions and calculus, probability and statis- 
tics, and finite mathematics. 

For each population in the SIMS study, the 
behavior dimension referred to four levels of cognitive 
complexity expected of students: computation, com- 
prehension, application, and analy^^'S. This classifica- 
tion is adapted from Bloom's taxonomy of educational 
objectives (1956). The adaptation involved replacing 
'Icnov-' ledge" with "computation", and eliminating the 



higher levels of synthesis and evaluation. Data from 
such tests can then be reported in several ways. First, it 
can be reported in terms of items or cell means. For 
example, in Figu* ^ 1 , the means are presented for six 
items on a topic (each given a different instrument) for 
different students at differen t grades in the province of 
Ontario, Canada (McLean, 1982). Second, item sc^t- is 
can be aggregated by columns to yield cognitive level 
scores or by rows to yield topic scores (see Figure 2). 

A strength of profile achievement tests is that 
they can provide useful information about groups, and 
are particularly useful forgeneral evaluations of changed 
educational policy that directly affects classroom 
instruction. However, profile achievement tests are 
weak in four specific areas. First, because they arc de- 
signed to reflect group performance, they are not useful 
for individual ranking or diagnosis. An individual 
student takes only a sample of items. Second, they are 
somewhat more costly to develop, and harder to ad- 
ministerandscorcthan priortests. Third, because they 
yield ^ profile of scores, they are often difficult to 
interpret. 

Finally, however, the primary weakness of most 
profile achievement tests center on the outdated as- 
sumptions underlying the two dimensions of content- 
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by-behavior matrices. The content dimension in- 
volves a classification of mathematical topics into "in- 
formational" categories. As I (1983) have argued: 



in the literature on assessment andevaluation. Tlielast 
is a conservative political and practical trend which, in 
some respect, runs counter to the other trends. 



"Informational knowledge" is material that 
can be fallen back upon as given, settled, 
established, assured in a doubtful situation. 
Clearly, the concep'-s and processes from 
some branches of mathematics should be 
known by all students. The emphasis of 
instruction, however, should be '^knowing 
how" rather than "knowing what" (p. 122). 

Fuithermore, items in any content category are 
tested as if they were independent of one another, a 
practice that ignores the interconnections between 
ideaswithinawell-defined mathematical domain. Sch- 
ocnfeld and Herrmann (1982) cautioned about the 
problems inherent in testing students on isolated tasks. 

If they succeed on those problems, we and 
they congratulate each other on the fact 
that they have learned some powerful 
mathematical techn.ques. In fact, they 
may be able to use such techniques 
mechanically while lacking some 
rudimentary thinking skills. To allow 
them, and ourselves, to believe that they 
understand the mathematics is deceptive 
and fraudulent (p. 29). 

Thus, the items should reflect the interdependence 
(ratherthanindepcndence)ofideasinacontcntdomain. 

The behavior dimension of matrices has always 
posed problems. All agree that Bloom's Taxonomy 
(1956) has prov:^n useful for low-level behaviors 
(knowledge, comprehension and application) but dif- 
ficult for higher levels (analysis, synthesis, and evalu- 
ation). Single-answer, multiple-choice items are not 
reasonable at those levels. One problem is that the 
Taxonomy suggests that "lower" skills should be taught 
before "higher" skills. The fundamental problem is the 
Taxonomy's failure to reflect current psychological 
thinking, and the fact that it is based on "the naive 
psychological principle that individual simple behav- 
iors become integrated to form a more complex behav- 
ior" (C^ollis, 1987, p. 3). In the past 30 years, our 
knowledge about learning and how information is 
proccesscd has changed and expanded. 

In summary, profiling is important but current 
profile tests fail to reflect the way mathematical knowl- 
edge is structured or how information is processed 
within mathematical domains. 

Trends 

In this section four trends are described. The 



O first three arc academic or theoretical trends apparent 




The Trend in Program Evaluation 

Far from the limited alternatives of "treatment/ 
control" orrandomized designs (see Campbell &Stanley, 
1966), contemporary evaluators have developed a 
diverse assortment of evaluation approaches from which 
to choose, given purpose, context, program stage, c 
In contrast to the 1960*s "one right way" today evalu- 
ators have multiple (and not always compatible) ap' 
proaches. Thistrendbeganinthel970*swhenscholars 
traincdindisciplinesother than experimental psychol- 
ogy were asked to assist in educational evaluations. 
Scholars like Michael Young (1975), Michael Apple 
(1979), and Tom Popkewitz (1984), whose training 
was in anthropology, sociology, and political science, 
brought the methods of information gathering and 
analysis of those disciplines to evaluation. In fact, the 
list of names of designations for the new methods and 
models can be confusing to someone unfamiliar with 
thefield of evaluation and thecon trovei«ies that under- 
lie the various empirical procedures. For example, the 
catalogue of choices now available to evaluators in- 
cludes: goal-free evaluation (Scriven, 1974); advocate 
evaluation (Stake & Gjcrde, 1974; Reinhard, 1972); 
connoisscurship (Eisner, 1976); user-driven evaluation 
(Patton, 1980); ethnographic evaluation (Fetterman, 
1984); responsive evaluation (Stake, 1974); naturalis- 
tic inquiry (Cuba & Lincoln, 1981). 

These diverse approaches to evaluation differ 
on many dimensions. Chief among them are the role of 
the evaluator (from educator to management consult' 
ant to assessor to advocate), role of the client (from 
active stakeholder and collaborator to passive recipient 
of evaluation product), to overall design (from experi- 
mental orquasi-experi mental to exploratory), andfocus 
(on process — formative evaluation — or outcome — 
summative evaluation). Each of these dimensions 
corresponds to the contingencies upon which evalu- 
ation choices are based: purpose, decision context, 
stage of program development, status of theory or 
knowledge base, etc. One consequence for product 
development was the specification of four stages of 
evaluation :- 1 ) product design stage — this involves 
developing a needs assessment; 2) product creation 
stage — this involves gathering formative data to im- 
prove the product; 3 ) product implementarion stage — 
this involvesdemonstrating differences between prod- 
ucts and making sure appropriate support services are 
available; and 4) product illuminative stage — this 
involves an in-depth examination of how the product 
is actually used (Romberg, 1975). 

Another consequencehas been the use ofa con- 
vergent strategy: i.e. using several different evaluation 
models with the same program. For example, in the 
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IGE Evaluation Study which I directed (Romberg, 
1985), we gathered data about reading and mathemat- 
ics in schools in four phases. Phase 1 involved large- 
scale survey procedures (including the use of a standard- 
ized test). Phase 2 was a follow-up study examining the 
validity of the Phase 1 data. Phase 3 was an ethno- 
graphic study of six exemplary IGE schools. Finally, 
Phase 4 was a detailed examination in Grades 2 and 5 
using time-on-task observations and the repeated ad- 
ministration of criterion-referenced tests. 

Note also that evaluation experts began calling 
forbetterand different instrumentation toga ther infor- 
mation about student performance. Overall, while 
program evaluation models have proliferated and the 
questions which they must address have become clear, 
the information used to answerquestions too often still 
comes from inappropriate tests. 

It is only recently that it has become apparent 
that the kind of evidence one needed to gather to judge 
many programs is, of necessity, different from that 
obtained from conventional assessment procedures. 
Tests given in a restricted format (eg. multiple-choice 
items) and in a restricted time fail to capture the 
important aspects of doing mathematics. During the 
past decade researchers have developed a plethora of 
procedures for gathering information from students: 
think-aloud interview procedures, performance tasks, 
projects (both individual and group), hierarchical rea- 
soning tasks, etc. Unfortunately, with one notable 
exception, these procedures, because of cost of admini- 
stration, have not been used in program evaluations. 

The exception is the evaluation of the Hewet 
Mathematics A Project in the Netherlands (deLange, 
1987). In that evaluation five different tasks were used 
to gather information: timed written tests, two-stage 
tasks, take-home task, an essay task, and an oral task. 
Together the overall picture of how well students learned 
from that program isgready enriched as a result of using 
information from the five tasks than would have been 
possible using any one. 

Trends in External Assessment 

While past assessment procedures are useful for 
some purposes and undoubtedly will continue to be 
used, they are products of an earlier era in educarional 
thought. Like the Model T Ford assembly line, objec- 
tive tests were considered an example of theapplicarion 
of modem scientific techniques in the 1920s. Today, 
we are both technologically and intellectually equipped 
to improve on outdated methods and instruments. The 
real problem is that all three forms of tests (profile, 
standardized, and criterion- referenced) are based on 
the same set of assumptions: an essential ist view of 
knowledge, a behavioral theory of learning, and a 
dispensary approach to teaching. It should be obvious 
that new assessment techniques need to be developed 



which are consistent with a different view of knowl- 
edge, learning, and teaching. 

New evaluation models are being developed 
which demand new assessment procedures One new 
approach is based on the specification of mathemarical 
domains and the development of items that reflect that 
domain (Romberg, 1987). In turn, this assessment 
approach grows out of the extensive research on such 
domains. A good example is the work of Gerard 
Vergnaudwith respectto "conceptual fields" (cf. 1982). 
The principles that are followed in this approach in- 
clude: 

Principle 1. A set of specific and important 
mathemarical domains need to be idenrified, 
and the structure and interconnectedness of 
the procedures, concepts and problem 
situations in each of the domains would need 
to be specified. 

Note that this approach is different from the 
current approach to specifying the mathemarical con- 
tent of a test in that networks are being defined rather 
than categories. This means that the interconnecrions 
of concepts and procedures with problem situations are 
as important as mastery of any node (e.g. a specific 
procedures). Forexample, consider the following exer- 
cise in second grade addition and subtracdon: 



Sue received a box of candy for her 
birthday. She shared 27 pieces with her 
friends and now has 37 pieces left. How 
many pieces of candy were originally in 
rhe box? 



To solve this exercise, a child would be expected 
first to represent the quanritarive informadon with the 
subtraction sentence^- 27 =37. Second, the sentence 
should be transformed to the addidon sentence 
27 + 37 = □ ; then the addition performed to yield an 
answer. What is important is that the child must know 
that separating situations can be represented by sub- 
tracdon sentences, that subtraction sentences can be 
transformed into equivalent addidon sentences, and 
that there are procedures for performingadditions, etc. 
Each piece of knowledge, while important, contributes 
to a solution process or way of reasoning about a 
situation that is more important than any single con- 
cept or process. 

Principle 2. A variety of tasks should be 
constructed that reflect the typical 
procedures, concepts, and problem situations 
of that domain. 

This is the key principle in that the envisioned 
tasks are not just a more clever set of paper-and-pencil, 
multiple-choice test items. Although some typical test 
items may be dp^jropriate for determining mastery of 
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some specific concept or process, many of the taski must 
be different For example, some should be exercises 
which require the student to relate several concepcs and 
procedures such as the example from addition and 



subtraction given above. Some would ask students to 
communicate their understanding of a representation, 
such as the following graphical representation (see 
Figure 3). 



The map and the graph below describe a car journey from Nottingham to 
Crawley using the Ml and M23 motor^vays. 




Describe each stage of the journey, making use of the graph and the m^.p. In 
particular describe and explain what is haupening from A to B; B to C; 
i:toD;.Dto.EandEtoF. *^ - 
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Figure 3. Interpreting a Journey. (Swan, 1986, p. 12) 
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Other tasks may emphasize the level of reason- 
ing associated with a set of questions about the saiiit 
situation such as thefollowing superitem (see Figure 4). 
Still other tasks may ask studems to carry out a physical 



process, such as gather data, measure an object, con- 
struct a figure, work in a group to organize a simulation, 
etc. And still others may be open-ended like the 
following "roller coaster** problem (see Figure 5). 



This Is a machine that changes numbers. It adds the number you put in 
three times and then adds 2 more. So If you put in ^, it puts out I'*. 




If k is put out, what number was put in? 

If we put in a 5» what number will the machine put out? 

If we got out a ^1, what number was put in? 

If X is the number that comes out of the machine when the number y is 
put in, write down a formula that will give us the value of y whatever 
the value of x. 



Figure 4. An Example of a "Super Item**. (Collis, Romberg, and Jurdak, 1986, p.l2) 




The picture above shows the track of a free-wheeling roller-coaster, which 
is travelling at a walking pace bcC/^een A and 3. 

1, Write a paragrapn describing how you think the speed of the roller- 
coaster varies as it travels along the track. (Use the letters A 
and 0 to help you in your description.) 

2. Now sketch a graph which shows how the speed varies as it travels 
along the track. (Don't expect to get it rignt the first time.) 
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Figure 5. Interpreting a Roller^Coaster* (Swan, 1986, p.l2) 
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These illustrations demonstrate that there are 
several different aspects of doing mathematics within 
any mathematical domain. To be able to assess the 
level of maturity in a domain an individual or group has 
achieved requires that a rich set of tasks be constmcred. 

Principle 3. Some tasks in a particular 
domain would be administered to students via 
tailored testing (and for groups via matrix 
sampling as well). 

Not all tasks for a domain need to be given to a 
student or group to determine die level of maturity. 
The technology is available to systematically vary sev- 
eral aspects of any exercise or problem situation. For 
example, for the subtraction exercise under Pri nciple 1 , 
one could vary the situations (join-separate, part-part- 
whole, comparison, etc.), the size of the numbers, the 
transformations, and the computational strategies 
(counting, algorithms, etc). 

Principle 4. Based on the tasks administered 
to a student in a domain, their complexity, 
and the student^s responses to those tasks, the 
information should be logically combined to 
yield a score for that domain. 

Note that this score is not just the number of 
the correct answers a student has found. Instead, it 
would involve Boolean combinations of information 

(such as, following inferential rules like "if and 

> then "). The intent of the score is that it 

reflect the degree of maturity the student has achieved 
with respect to that domain. Note that this assumes all 
students are capable of some knowledge in several 
domains. 

Principle 5. Construct for each individual or 
group a score vector over the appropriate 
mathematical domains. Thus, for any 
individual one would have several scores (X, , 
Xj, where is the score for a 
particular domain. 

Note that th is simply reinforces the notion that 
ma th ema t ICS is a pi ura I noun . Ra ther, ma thema tics en - 
compasses several related domains. 

In summary, awareness of a problem, such as the 
needforalternative testing procedures forschool mathe- 
matics, does not mean solutions are easy. It may take 
years to replace current testing procedures in schools. 
Neverthel'^ss, this should not deter us from exploring 
plausible alternatives. What is needed are tasks that 
provide students an opportunity to reflect, organize, 
model, represent, argue etc. within specific domains. 
Constructing, scorii ^, scaling, and interpreting re- 
sponses to such tasks tor domains will not be easy, but 
in the long run, worth the effort. 



Trends in Assessment by Teachers 

One striki ng consequence of the scientific, psy- 
chometric assessment procedures has been to deskill 
teachers. External objective assessment was deemed 
better than professional judgment. Today, too many 
teachers are no longer trained in evaluation and lack 
confidence in their ability to judge student perform- 
ance(Apple, 1979). Awareofdiis, a trend to empower 
teachers is emerging. For example, the Graded Assess- 
ment Project :n England (Close & Brown, 1988) pro- 
vides teachers with procedures to assess performance. 
This theme is central to the North American NCTM 
Evaluation Standards ( 1 988). It isalsoa major compo- 
nent in the Australian MCCP project (Clarke, 1987); 
and is a focal part of the CGI research project at the 
University of Wisconsin (Peterson, Carpenter, Fen- 
nema,&Loef, 1987). 

Practical-Political Trend 

In most of the world, it is generally agreed that 
the educational system, as a whole, and the teaching of 
mathematics, in particular, need to change. Demands 
are being made of governments, pol iticians, and admin- 
istrators for funds to bring about this reform. In turn, of 
course, administrators have a right to demand that evi- 
dence be gathered that their moniesare well spent, that 
changes are made, and that the changes make a differ- 
ence. Valid pupil performance data is the kind of infor- 
mation demanded. 

However,govemmentalexpectationsaboutsuch 
data in theUnitcdStatesandGreatBritain revert back 
to the scienti^^.c-experimental notions of die past: 
behaviors! objectives, norm- referenced scores, Bloom's 
Taxonomy, ... Forexample, "attainment targets" in the 
new national curriculum in Great Britain are merely 
new labels for behavioral objectives. The use of SIMS 
items for policy profiles (e.g., in Italy and in some parts 
of the United States) continues the practice of not 
assessing problem-solving strategies, communication 
skills, level of reasoning, etc. These, along with other 
examples, make it clear that there is considerable 
disparity between current theory and these practical 
demands. The demands for information are legitimate. 
The validity of procedures is suspect. 

Conclusions 

The f lel d of as scssm en t an d eva I ua t ion has come 
a long way during the last quarter century. However, a 
lot needs to be done. Growth in domains has been 
replaced with general levels of p<!rformance. 

Unless changes are made in the way in which in- 
formation isgathaedfrom students, we will only con- 
tribute to the ongoing difficulties of sterile lessons, 
further deskllling of teachers, and so on. Instead, we 
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need toconceive of curricular evaluations and of assess- 
ments of individual progress in li^'ht of mathematical 
maturity' in specific domains. 

1. Current testing procedures are unlikely to 
provide valid information for decisions 
about the current reform movement. 

Current tests reflect the ideas and technology of 
a different era and world view. They cannot assess how 
students think or reflect on tasks, nor can they measure 
interrelationships of ideas. 

2. Work should be initiated (or extended) to 
develop new assessment procedures. 

Only by having new assessment tools that reflect 
authentic achievement in specific mathematical do- 
mainscan we provide educators with appropriate infor- 
mation about how students are performing. Of neces- 
sity, this implies that considerable funds be allocated 
for research and development. Only when new instru- 
ments are developed will we no longerbe bound by old 
assessment procedures rooted in the traditions of the 
Industrial Age. 

3. The emerging variety of evaluation models 
need to utilize assessment procedures which 
reflect the changes in school mathematics. 

Today, school mathematics is changing the 
emphasis from drill on basic mathematical concepts 
and skills to explorations that teach students to think 
cntically, to reason, to solve problems. The criteria for 
judging level of performance by a student or group of 
students should be based on these notions. This will 
involve the student*s capability — when posed with a 
problem situation in a specific mathematical domain — 
of communicating, reasoning, modeling, solving, and 
verifying propositions. Also, the index or scale devel- 
oped to measure performance should reflect the stu- 
dent's level of maturity in that domain. 
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CURklCULUM-LINKED ASSESSMENT: A MODEL BASED ON THE 
SECOND INTERNATIONAL MATHEMATICS STUDY 



Kenneth J. Travers 



The Second International Mathematics Study 
(SIMS) was a comprehensive survey of the teaching 
and learning of mathematics in the schools of sorne 
twenty countries (educational systems) around the 
world. The Study was conducted under the aegis of the 
International Association for the Evaluation of Educa* 
tional Achievement. In the S'.udy, detailed informa* 
tion was obtained on the content of "he implemented 
mathematics curriculum, what mat imatics was actu' 
ally taught by the teachers, and how that mathematics 
was taught. Student achievement and attitudes were 
assessed using internationally developed tests and ques' 
tionnairesthatwere taken by random samplesofmathc' 
matics classes in each country. The Study was targeted 
at 13-year olds in most countries (IZ-yearolds in Japan 
and Hong Kong), and at those students at the end of 
secondary school who were enrolled in advanced col' 
legC'preparatory mathematics courses. The lower level , 
younger group was called "Population A"; the older 
group, "Population B". 

For each target population, topics were tested 
that reflected an international consensus of mathe- 
matical content judged to be important by panels of 
experts in each country. As a result, the fit of the tests 
to the curriculum varied somewhat from country to 
country. Data were obtained from teachers as to whether 
the content had been taught to the students who were 
tested. This information, called "opportunity-tO'leam," 
provided a backdrop for interpreting the arhievement 
ocores. In each participating country, SIMS was carried 
out by a nationally recognized educational research in- 
stitution under the direction of a national commi tree of 
specialists in mathematics education and educational 
research. 

The First International Mathematics Study took 
place in 1964 in twelve countries. Eleven of these 
countnes, including the United States and Japan, par- 
ticipated in the second study in 1980'82. 

In the United States, students in^ipproximately 
500 mathematics classrooms in about 250 public and 
private schools randomly selected from across the 
country were tested at the end of the 1981-1 982 school 
year. (A number of countries, including the United 
States, also tested the students at the beginning of the 
school year.) The countries (systems) taking pnrt in the 
O tudy were: 
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Belgium (French & Flemish) Luxembourg 
Canada (British Columbia & Ontario) Netherlands 

England and Wales New Zealand 

Finland Nigeria 

France Scodand 

Hong Kong Swaziland 

Hungary Sweden 

Israel Thailand 

Japan United States 



The S:MS Model 

The Second Intemarional Mathematics Study 
was based on three aspects of the curriculum: the in- 
tended curriculum, the implemented curriculum, and 
the attained curriculum. The intended curriculum is 
reflected in curriculum guides, course oudines, syllabi, 
and textbooks adopted by school systems. In most 
countries, national curricula emanate from a ministry 
ofeducation or similar body. In the United States such 
statements of intended goals and curricular specifica- 
tions come from the Department of Education in each 
state and from local districts. Thus, it was considerably 
more difficult to describe the intended curriculum for 
the United States than for almost any other country 
that took part in the study. 

The implemented curriculum focuses on the 
classroom, where the teacher interprets and puts into 
practice the intended curriculum. Teachers exercise 
their own judgment in translating curriculum guides 
and adopted textbooks into programs for their classes. 
Hence, their selection oftopicsor patterns for emphasis 
may not be consistent with those intended. 

To identify the implemented curriculum, a 
number of questionnaires were developed for classroom 
teachers tocomplete. Forexample, teachers were asked 
whether or not they had provided instruction for each 
of the items on the achievement tests. They were 
questioned about such matters as the use of calculators 
in their classes. They were also asked to provide 
detailed information on he number of class periods 
that they devoted to specific topics and subtopics and 
onhow they prcscntedand interpreted this mathemati- 
cal content to their classes. 

The attained curriculum is what students have 
leamedasmeasuredby tests and questionnaires. Exten- 
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sivc achievement tests were designed to assess student 
knowledge - nd skills in arej?5 of mathematics desig- 
nated as important and appropriate for the students 
being tested. The fit between these tests and the actual 
curricula in individual countries varied considerably. 
The tests contained items that were less appropriate in 
some countries than \x\ others. Furthermore, the tests 
could fiOt possibly contain an adequate range of items 
to fully represent all curricula in all countries. 

The student outcome measures also included a 
number of opinion surveys and attitude scales. These 
were devised to elicit student views on the nature, im- 
portance, ease, and appeal of mathematics in general 
and of selected mathematical processes. 

The SIMS model provides an array of back- 
ground i fomiation for viewingstudent outcomes. That 
is, one can regard cross-national patterns of achieve* 
ment in the light of the content of the (intended) cur- 
riculum in each country and teacher coverage (oppor* 
tunity-to-leam) of that content. Th-^refore, the model 
enables a trianguladon on student outcomes. For some 
countries, two additional sources of data were available. 
Those data points are (i)pre'tcst data — students were 
tested at the beginning of the school year, as well and 
(ii) classroom process data — detailed infonnadon on 
how the teacher handled the subject matter as it was 
presented during the school year. 



SIMS as a Model for Assessment 

The SIMS model lends itself to powerful ap- 
proaches to program (curriculum) assessment. Note, 
for exuTiple, Cronbach's (1964) distinction between 
every-pupil testing and evaluation for courf "mprove- 
ment. Croribach has noted that the concern in every- 
pupil testing is for precise and valid com pari sons among 
individuals (forpurposcs, say, of making decisionsabout 
promotion, selection or reporting). As Cronbach has 
noted: 

Much of test theory and test technology has 
beei - concerned with making measurement" 
precise. Important though such precision is :or 
most decisions about individuals, I shall argue 
that in evaluating courses we need not struggle 
toobtainpreciscscoresfor individuals.. .(p. 233) 

SIMS, as an activity in program assessment (as 
contrasted with testing for making decisions about i? 
dividual students), has the following features. 

1. Curriculum Coverage — item sampling 

Since the interest in program assessment is not 
in scores for individual students, but in how well a body 
of subject matter has been learned by a coliort of 
students, SIMS used an item-sampling scheme for test- 



ing. Under this plan, a comprehensive set of mathe- 
matics items was responded to by the entire class. 
However, w.:hin the class, subsets of items were an- 
swered by a fraction of the students. (The eighth grade 
test has only 180 items and the twelfth grade test has 
136 items.) 

2. Test Scores 

In program evaluation, interest resides not in a 
single testscore, but inachievementat thesubscore and 
item level. As Hamisch and Linn (1981) point out, a 
score of 1 0 on a 20 item test could have been arrived at 
in 184,756 ways. Again, Cronbach (1964) states: 

Outcomes of instruction are multidimensional, 
and a satisfactory investigation will nap ou t the 
effects of the course along these dimensions 
separately ...To agglomerate many types of post- 
cou se performance into a single h^^ac is a mis- 
take, since failure to achieve one objective is 
masked by success in another direction. More- 
over, since a composite score embodies (and 
usually conceals) judgements about the impor- 
tance of the various outcomes, only a report that 
treats the outcomes separately can be useful to 
educators who have different value hierarchies, 
(page 236) 

3. Curriculum-linked vs, curriculum-free testing 

Much of large scale testing in the United States 
entails tests of general intellectual development or 
aptitudetl it a re often used as criteria for school achie 
ment or effectiveness. SIMS, by contrast, focuses on 
the mathema-ical content of the curriculum, as found 
in the syllabus or textbook, as taught by the teacher and 
as learned by the student. 

As Madaus (1979) has stated: 

It has been argued that although tests of general 
intellectual dc:velopment or intelligence do not 
measure the behavioral objectives of specific 
programs, they are in fact the best criteria we 
have of general educational development 
(Cooley and Lohnes, 1976) . This may be so, but 
it seems odd to measur^^ what is admittedly a side 
effect of education wh.! jat the same time ignor- 
ing the more direct results of particular curricula 
and courses. ...Conchisions about the direct in- 
structional effects of schools should not have to 
rely on evidence relating to skills taught inci- 
dentally. (Madaus, et al., 1979). 

Curriculum Analysis within the SIMS Framework 

SIMS was developed on the basis of a survey of 
the mathemarics curriculum for each target population 
in each pa rricipating country. Consequently, informa- 
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tion about the curriculum rs a fund& ental product of 
SIMS- A framework for studying the curriculum, and 
in developing the international item pools, was a grid 
that consists of rows for mathematical topics and col- 
umns for behavioral levels at which the topics are 
considered. 

The design of SIMS facilitates a detailed analy- 
sis of curriculum for a country or educational system. 
Activities tliat may be undertaken at the system level in 
order to assemble information that is useful to have in 
order to better understand SIMS data include: 

a. A detailed look at the system's mathematics curricu- 
lum, from the perspective of the SIMS grid and item 
pool. Information is provided not only on the content 
dimension (what subject matter is in the curriculum), 
buton the behavioral dimension (what cognitive levels 
are perceived to be emphasized in the curriculum). 

b. Identification of curriculararcas that arc emphasized 
or not emphasized (again with respect to SIMS). 

The above information is most useful in helping 
to interpret data from the teachers on opportunity to 
learn (e.g. To what ex tent do teachers cover vectors, an 
important topic in the curriculum ?and student achieve- 
ment (e.g. Can low student achievement in probability 
be attributed to low teacher coverage?) 

Why Replicate SIMS? 

SIMS provides a mechanism whereby an educa- 
tional system (a state, a province, or a school district) 
can obtain detailed information on its curriculum as in- 
tended, implemented and achieved. The exercise of 
analyzing the content of the intended curriculum can 
serve to identify, within a common framework, those 
aspects of mathematics that are emphasized and those 
that are of less importance. With such data in hand, 
curriculum supervisors are then able to make more 
informed decisions about the system's goals for mathe- 
matics education. Since the SIMS framework is inter- 
national in scope, 'ducational personnel have the 
opportunity to make comparisons not only with their 
own national system, but with those of other countries. 

At the level of the implemented curriculum, 
data on teachci coverage of various topics can be useful , 
for example, as a basis for designing in-scrvice pro- 
grams. (Consider a system where achievement in proba- 
bility and statistics is found to be much lower than 
desired. Assume further that it is found that teacher 
coverage of these topics is low. It may be that the topics 
are in Chapter 15, at the end of the textbook, and 
teachers tend to not get that far. Or it may be that 
^£achers avoid the topic since they feel unprepared to 
teach it. Programs of professional development could 
then be designed to assist the teachers in greater man- 
aging instructional time to enable better coverage of 



important topics. Alternatively, workshops couM be 
devised to upgrade teachers* subject matter and instruc- 
tional c( mpetencies. 

Another use of a SIMS replication may be that 
of a tool for assisting in implementing a new curricu- 
lum. In this time of curricularchange a variety of frame- 
works are being proposed for revising the instructional 
program for a school or district. The SIMS model may 
provide 'l)enchmarks" for use in assessing the degree to 
which curriculum reform Wss occurred over a period of 
time. For example: 

Intended Curriculum, The SIMS curriculum analysis 
can help identify aspects of a system's curriculum that 
are aligned with the desired plan (framework) as well as 
those aspects still needing i^efinement. 

Implemented Curriculum, An analysis of the data on 
teacher coverage (opportunity ro learn) can help iden- 
tify topics and strategies that need further attention 
(say, through in-service programs). 

Summary 

The Second International Mathematics Study 
is based on a model that views the curriculum as 
intended (e.g. content of syllabi, courses of study), as 
implemented (content actually taught by the teacher), 
and as attained (mathematics learned by the student). 
Consequently, patterns of achievement (either within 
or between educational systems) may be examined 
against a background of detailed information on the 
content of the curriculum both as intended to be taught 
and as actually (reported to be) taught. Such detailed 
curriculardata maybe useful to curriculum supervisors 
and evaluators, for example, as they assess presem 
curricula, plan new programs and seek to document the 
extent to which curricular innovation has taken plac*. 

The kinds of data which may be obtained from 
SIMS replications within countries include: 

a. Background data of a great variety: e.g., character- 
istics of schools, teachers and students. 

b. Curricular content data: e.g. what topics are in the 
curriculum for each target population, ir the various 
countries. 

c. Teacher coverage da ta : e g. between countries: What 
topics receive what level of coverage? Is algebra taught 
to junior high school age students (12-13 yrs.) in all 
SIMS countries? Within countries: e.g. Are all stu- 
dents, or only those in upper academic tracks, taught 
algebra? 
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CALCULUS IN THE HIGH SCHOOLS OF THE UNITED STATES OF 
AMERICA AND CANADA (ONTARIO) 

Michael K. Dirks - David R Robitaille • John Leduc 



The teaching of calculus at the secondary school 
level in North America has been ar.d remains a contro- 
versial matter. On the one hand, some college and 
university mathematicians argue that this course be- 
longs exclusively within :heir jurisdiction (Henry, Jones, 
and Kenelly, 1985). They may question the equiva- 
lence of the high school course with their own as well 
as bemoan the lack of basic preparation which in- 
coming high school students have in pre-calculus top- 
ics. On the other hand, some educators assert that in 
countries such as Japan and the Soviet Union a much 
greater proportion of the age cohort successfully studies 
calculus at the secondary level, and imply that this fact 
enhances the ability of these countries to compete 
industriallyormilitarily (Wirszup, 1980). In this report 
data collected in 1981-82 as part of the Second Inter- 
national Mathematics Study are used to describe the 
teaching and the learning of calculus at the secondary 
level in the United States and in the Canadian prov- 
ince of Ontario. A number of achievement compari- 
sons with other jurisdictions are also included to pro- 
vide a better basis for drawing conclusions. 

The Teaching of Calculus 

Approximately 125 Grade 13 classes from Ontario 
and 175 Grade 12 classes from the United States 
participated in the Second International Mathematics 
Study at the senior secondary level. Mot all of these 
clashes studied calculus, however, and the present re- 
prn is based on responses from the 62 Canadian and 44 
imerican classes who were in fact taking a calculus 
course, and whose teachers completed the Calculus 
Questionnaire. 

Of the 44 American classes, 43 reported spending 
a full year on calculus and other spent more than 
one semester but less than a year on the subject. Of the 
62 classes in Ontario, 5 1 reported spending a ful 1 school 
year on calculus, 1 0 spent more than a semester but less 
than a year, and one class studied one chapter of calcu- 
lus. Twenty six of the teachers in the United States 
reported that their students took the Advanced Place- 
ment (AP) examination, with 23 taking the AB exam 
and 3 taking the BC exam. None of the Canadian 
schools indicated that their students had taken the AP 
examination. 

Curriculum Materials and Course Content 

The two groups differ markedly in the t':xtbooks 
they used, as well as in the amount of supplementary 
O materials which teachers reported producing for their 
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classes. The Canadian teachers used six different texts 
with two of these texts accounting for 55 classrooms. 
At least a dozen different books were used in the 
United States and only one was used in more than five 
classrooms. Of all the different texts in use, only two 
were used in calculus classrooms in both countries and 
only in five classrooms. The majority of the Canadian 
teachers reported using a text which provided a "some- 
what intuitive treatment of calculus" and which"might 
be described as 'pre-college' calculus texts" (Alexan- 
der, 1987). Most of the American teachers reported 
using a standard, college-level calculus text. 

While only a minority of teachers supplemented 
the text with materials which they developed them- 
selves, those in Ontario did so more frequently than did 
their American counterparts. For example 14 of the 36 
Canadian teachers who taught integration by trigono- 
metric substitution indicated that they had developed 
supplementary materials while only 2 of 32 American 
teachers reports developing such materials for this 
topic. 

The Calculus Questionnaire was designed to OD- 
tain information on the teaching of 21 topics. Teach- 
ers were asked whether or not they taught the topic, 
how it was taught (as new material, reviewed, and 
extended, or assumed as prior knowledge), how diffi- 
cult ttie copic was to teach and to learn, what influ- 
enced their decision to teach a topic, and whether or 
not the topic was in the text. 

Responses from teachers in Ontario and from the 
United States indicated that 10 of the 21 topics v'cre 
almostuniversally taught in both jurisdictions: llmitof 
a function; limit as X approaches infinity; derivative of 
a polynomial function; derivative of a sum, a differ- 
ence, a product, and a quotient of functions; chain rule 
fordifferentiati on; implicitdifferentiation;rclated rates; 
relative extrema; definite integral as the area under a 
curve; and calculus of exponential and logarithmic 
functions. These topics were taught in most classes as 
new material, with a few of the American classes 
treating the topics as material to be reviewed and 
extended. 

The percent of teachers who included each of the 
21 topics in their courses, or who assumed that a topic 
had been previously learned, are displayed in Table 1 . 
Since teachers very rarely indicated that a topic was 
assumed as prior knowledge or reviewed, this informa- 
tion is not shown separately. Table 1 also includes the 
percent of teachers who responded that a topic was in 
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the student text. The results indicate that American 
students were moie likely to cover more of die typical 
first year college calculus topics than were students in 



Ontario. This is particularly true for the topics of 
continuity, arc length, mediods of integration, and in- 
determinate forms. 



Table 1 

Frequency with Which Topics Appear in Classes and Texts 



Taught or assumed Present in text 

Topic as prerequisite 

(Percent) (Percent) 





USA 


Ontario 


USA 


Ontario 


Limit Ota sequence 


73 


95 


68 


70 


Limit of a function 


100 


97 


98 


69 


Limit as X approaches infinity 


98 


98 


96 


66 


Continuity 


100 


56 


93 


50 


Dcnvative ot a polynomial hjnction 


100 


100 


98 


81 


Uenvative or a sum, difference 










product or quotient of functions 


100 


98 


98 


78 


Chain rule for differentiation 


1 A A 

100 


98 


96 


81 


Implicit differentiation 


100 


98 


96 


81 


Related rates problems 


95 


98 


93 


79 


Relative extrema 


98 


92 


96 


73 


Definite integral as area 


100 


94 


98 


90 


Arc length 


67 


34 


82 


35 


Calculus of logarithmic and 








exponential functions 


98 


92 


96 


81 


Indeterminate forms and I'Hopital's 










rule 


66 


18 


75 


14 


Integration by trignometric 










substitution 


74 


57 


93 


41 


Integration by parts 


82 


54 


82 


47 


Integration by partial fractions 


57 


57 


80 


46 


Numerical integration 


64 


25 


80 


27 


Series 


24 


57 


48 


43 


Partial derivatives 


7 


10 


46 


9 


Multiple integrals 


2 


8 


39 


9 



Data were also gathered on the number of class periods 
spent on each of the twenty-one topics, and results for 
the twelve most frequently taught topics arc displayed 
in Figure 1. 



The boxplots indicate that American teachers tended to 
devote more periods to most of these topics than their 
Canadian counterparts. 
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Periods 




Figure 1. Duration of presentations for selected topics. 



Classroom Activities, Instructional Aids, and 
Applications 
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Teachers were asked to estimate the percent of 
time within a typical period that was spent on each of 
three typesofactivi ties: teacher presentation or teacher 
led review, teacher and student discussion of home- 
work, and student activicies and supervised study. The 
estimates were remarkably similar with medians for 
both the United States and Ontario near 40, 30, and 20 
percent respectively. Canadian teachers tended to use 
slighdy more time for presentation and review and 
slighdy less rime on homework. 

Few teachers of calculus appear to make use of 
teaching aids other than the textbook, the overhead 
projector, and the hand-held calculator. In particular, 
only one-fifd. f the American teachers said they made 
use of no vies in thei r cou rscs whi le almost no Canadian 
teachers did so. Twenty-five percent of Americans 
reported using physical models compared with 15 per- 



cent of the Canadians. In contrast, calculators were 
used in almost every Canadian classroom and in 80 per- 
cent of American ones. Computers or micro-comput- 
ers were used in only about ten percent of the classes 
surveyed. This figure, one would hope, may have risen 
sharply since these data were collected. 

Calculus has applicarions in many fields of study in 
the physical, biological, and social sciences. However, 
these teachers reported that the vast majority of the 
rime they devoted to applicarions in their calculus 
classes was in the areas of applications in the fields of 
physics and engineering. Applications from business 
ranked a distant third, as is shown in the boxplots in 
Figure 2. 
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100 



75 



50 



25 
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Ont. USA Ont. USA Ont. USA Ont. USA Ont. USA Ont. USA 
Physics Chemistry Engineering Biology Social Science Business 



Figure 2. Percent of time spent on applications from various fields. 



Applications from chemistry, biology , and thcsocial 
sciences were almost never considered. Given the im- 
portance that calculus now has in these areas and the 
number of students who will study them in college, this 
would seem to be an unfortunate state of affairs. Teach - 
ers, curriculu*n developers, and textbook publishers 
should be aware of the need to broaden their coverage 
of the applications of calculus, and calculus courses and 
tex tbooks should Include many more exa mples of appl i - 
cations from other areas. 

Teachers were also asked about sources of applica- 
tions. Most teachers indicated that the applications 
they presented were drawn from both the textbook in 
use and from supplementary textbooks. In addition, 34 
percent of the Americans and 60 percent of the Cana- 
dians reported creating applied problems themselves 
Other sources, such as professional journals and meet- 
ings, were seldom mentioned. Teachers were asked 
specifically if they utilized the UM.AP application 
modules, but none reported doing so. It must be added, 
however, that implementation of the UM AP materials 
was directed at the university, and not at the high 
school level. 



Content-Specific Teaching Methods 

One of the unique aspects of the classroom process 
questionnaires developed for use in the Second Inter- 
national Mathematics Study was the collecting of data 
related to the methods used by teachers to teach spe- 
cific concepts and skills. The Calculus Questionnaire 
explored how teachers handled some of the basic for* 
mulas, concepts, and theorems of calculus. 

Formulas: A number of formulas involving the 
derivative and the integral are usually developed in a 
first course in calculus. Each formula might be devel- 
oped through a formal proof, in informal derivation, or 
it might be srated without any derivation or justifici- 
tion. In turn, teachers' expectations for students might 
alsovary. Teachers were asked about the teaching ofsix 
such formulas, and their responses are summarized in 
Figure 3. 
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Teacher Expectations 
for Student Achievement 
(Percant) 



Teachers Using Method 
(Percent) 



0 (x") D(F/G) D[sec(u)) D[F+Gj /5in"(u)du D F[G(x)j 




Z)03Q3Q3Q DQ3Q 




I Expect a formal proof 
; Expect an mfornna) derivation 
^ Expect recall and use 
IIIIIIj Expect u<.o when given 
(remainder did not study) 



mil Gave a formal proof 
I '^ave an informal derivation 
Just stated 

(remainder did not cover) 



Figure 3. Mode of presentation and level of expectation for various formulas. 
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Taken at face value it would seem that the Cana- 
dian teachers are much more formal in their prescnta - 
tif.ins and expectations for these topics. Differences are 
especially large for student expectations. For example, 
slighriy more Canadian teachers than Amencan teach- 
ers gave a formal or informal derivation of the quotient 
rule for derivatives: 89 percent compared with 82 
percent. Many more Canadian teachers, however, 
required that their students also be able to provide a 
justifica t ion of this rcsul t: 53 pcrccn t compa red w i th 33 



percent. Similarly, although Canadian teachers were 
only slightly more likely to provide a derivation of the 
chain rule for their students, they were ^our times as 
1 ikel y to ex pec t tha t thei r s tuden ts w ould a Iso be abl e to 
J ustify this resu! t 1 1 can be argued th at wt thout holding 
students responsible for justifying the formulas which 
they leam at some level, as the Canadians are more apt 
to do, these formulas will be less meaningful and more 
easily forgotten. 
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Questions were also asked relating to the teach- 
ir'*- '"four basic concepts: limits, continuity, the 

:ive, and the integral. The questions and 
rco, cises are given below ,and when these responses 
are compared to those in Figure 3 some interesting 
questions arise. 

Limits and Continuity: To determine which of 
several approaches were used in the teaching of the 
concept lim f(x) = L as x approaches a, teachers were 
asked to indicate the methods which they used from 
sevtral choices. The statement and choices that they 
were given and the frequency of responses are listed 
below in the box. 

Several observations can be based on these re- 
sponses. First, the frequency with which teachers 
reported using graphical arguments to support the 
concept of the limit of a function at a particular point 
is surprisingly low. It might be expected that graphs 
would be universally employed but barely half of the 
Ontario teachers reported using them. The American 
teachers used graphical arguments rpore often, but over 
a third did not do so. 

A second observation has to do with the epsilon- 
delta and deleted neighborhood approaches to limits. 



Both of these approaches were used mixh more fre- 
quently in American classrooms. A formal epsilon- 
delta definition was given by ovtr 80 percent of the 
American teachers but by less than 20 percent of their 
Canadian counterparts. This is probably a reflection of 
the college texts usually used in the United States. 
Since the AB version of the Advanced Placement test 
does not include questions on epsilonics, preparation 
for this test would not in and of itself require teachers 
to use thiL formal approach to limits. Whether or not 
students* early encounters with the limit conceptshould 
involve epsilonics is a matter for debate. It would seem 
clear, however, that if such a formal approach is em- 
ployed that it should be accompanied by a graphical in- 
terpretation. From the responses to this question, 
however, it appears that a sizeable number of teachers 
use del ta-epsilondefini tions wi thout supporting graphs. 

As with the questions on formulas, teachers were 
asked what was expected by way of student concept 
attainment. Specifically, they were asked what they 
expected their students to be able to do after the 
concept lim f(x)=L as x approaches a had been taught. 
The statement and choices that they were given and 
the frequency of responses are listed in the box on the 
next page. 



In the teaching of the concept of lim f{x)=l 



(a) I discuss how as x "gets close to a,*' f(x) "gets close to L. 
Ontario 87% USA 82% 



(b) I use the formal definition of epsilon and delta. 
Ontario 18% USA 82% 



(c) I use the concept of limit of a sequence. 
Ontario 62% USA 21% 



(d) I use the concept of elements in a deleted neighborhood of a being mapped into a 
neighborhood of L. 

Ontario 3% USA 48% 



(e) I develop it intuitively with graphical arguments involving the graphs of particular 
functions. 

Ontario 52% USA 64% 



(0 I did not discuss limits with the target class. 
Ontario 3% USA 0% 
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After teaching the concept of 1 i m /( x) = L I expect my students to be able to: 

(a) evaluate the limit of a first degree polynomial function. 

Ontario 94% USA 89% 

(b) state the epsilon-delta definition of the limit of a ftinction. 

Ontario 13% USA 71% 

(c) give an epsilon-delta proof that the limit of /(x) = 2x + 5 is 9 as x^2 

Ontario 6% USA 77% 

(d) give an epsilon-delta proof that the limit of f(x) = is as x ^ 2 

Ontario 3% USA 52% 

(e) use the epsilon -delta definition of a limit of a function to prove that if 

limf(x) = L and iimg(x) =M, then limf(x) + g(x) = L + M 

x->a x->a x->a 

Ontario 2% USA 23% 

(0 I did not discuss limits with the target class. 
Ontario 3% USA 0% 



Since Canadian teachers seldom uscdepstlonics, it 
is not surprising that they seldom expected their stu- 
dents to do so. The rather high level of student expec- 
tation in the United States is somewhat surprising, 
however, considering the much lower expectations 
observed above for justifying formulas. Over 70 percent 
of the American teachers expected their students to be 
able to state an cpsilon-delta definition and to employ 
it to justify limits of lincarepsi Ion- delta proofs for limits 
of simple rational functions. The expectation that 
students could prove that the limit of the sum of two 
functions equals the sum of the limits was much lower, 
just over 20 percent of the American teachers expected 
student proficiency for this task. 

Teachers were also asked if they presented a formal 
definition for the concept of continuity of a function. 
The responses elicited are consistent with those for the 
limit of a function. Teachers in the United States were 
far more likely to use a formal epsi Ion-delta approach 
than those in Ontario. Almost 70 percent of the 
American teachers used a formal definition to prove 
that functions were continuous at specific points com- 
pared to just over 20 percent of the Canadians 
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Teachers were then asked what they expected 
thr ' students to be able to do after the concept of 
continuity had been taught. While almost 70 percent 
of American teachers expected thei r students to be a^ . 
to state and apply a formal definition of continuity, the 
corresponding figure in Ontario was only 14 percent. 

Clearly, teachers in the two jurisdictions held 
differing views on the importance of continuity. At the 
most basic level, nearly all of the American teachers 
expected their students to be able to identify graphs of 
continuous and discontinuous functions compared with 
just half of the Canadians. 

Derivative: Teachers were asked how they intro- 
duced the derivative of a function fix) at x=a. The 
differences so notable in the approaches to the first two 
concepts practically disappeared in this section. 
However, the similarity may be misleading given the 
underlying differences in approach to limit and conti- 
nuity. The statement, choices and frequency of re- 
sponses are listed at the top of the next page. 
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When introducing the derivative of a function f(x) at x=a, I discuss: 

(a) the rate of change of y( /( x) )with respe -.t to x in the flinction. 

Ontario 7.3% USA 73% 

(b) the limiting position of a secant line connecting the points (x,f(x)) and 
(a,f(a)), as x approaches a, on the curve y=f(x) . 

Ontario 94% USA 86% 



(c) 



lim (-^) 

Ax-»0 Ax 

Ontario 87% USA 82% 



(d) and use a formal definition of the derivative such as 

f{a+h)-f{a) 
J Ka)= hm ■ r 

Ontario 95% USA 98% 

(e) I did not discuss derivatives with the target class. 

Ontario 0% USA 0% 



While a formal definition was almost universally ering the importance of this notion in applying the 
given for the derivative, over a quarter of the teach- derivative, such an omission seems quite odd- 
ers in oth jurisdictions reported that they did not 

interpret the derivative as a rate of change. Consid- Teachers' expectations for their students are pic^ 

turcd through the following responses: 



Consider the following form of the definition of the derivative of a flincrion /at a point a- 

]\a)-\\m 7 

( \ \ \ h-^o h 

(a) 1 do not present this definition, or any equivalent form, to my students in the target 
class. 

Ontario 0% USA 2% 

(b) I present this definition, but I do not expect the students to either remember it or 
use it. 

Ontario 3% USA 2% 

(c) I present the definition and I expect the students to be able to use it in deriving 
general results about derivatives of specific functions, such as finding f (2) for 



((x) = X - X + 4 



Ontario 86% USA 96% 



(d) I present the definition anJ expect the students to be able to use it in deriving 
general results about derivatives. 

Ontario 71% USA 64% 

(e) I present the definition and expect the students to be able to use it in testing 
functions, such as y=x, for the existence of derivatives at points such as x=0. 



Ontario 14% USA 50% 
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These responses indicate that teachers in both functions for derivatives at particular points, 
places expect that students will be able to state and 

apply the definition of derivative. Canadian teachers Integration: The last major concept covered in 
apparently expect their students to be able to use the the questionnaire was the definite integral. The ques- 
definition to derive such results as the sum and tions were limited to an outline of a definition of this 
productsomewhat more frequently than the A meri' concept. The statement, choices and frequency of 
can teachers who were much more apt to expect responses are given below: 
their students to use the definition to test specific 


* 


Consider a formal definition of the definite integral as the limit of a Riemannian sum which might 
might involve: 






{f{x)dx = limT ( fie ))(x - x ) 

a ^0 






(a) I do not present this definition of the definice integral. 
Ontario 51% USA 16% 






(b) I present this definition but do not use it. 
Ontario 14% USA 18% 






(c) I did not discuss any interpretation of the definite integral with the target class. 
Ontario 0% USA 0% 






(d) I present this definition and use it to find the value of certain definite integrals, e.g., 

1 

r -> 






Jx'dx 

Ontario 25% USA 61% 






(c) I present this definition and use it to present some general theorems about the 
definite integral. 

Ontario 2% USA 2% 






(0 I present this definition and then immediately drop it in favor of specific rules for 
evaluating definite integrals. 
Ontario 8% USA 2% 






(g) I did not discuss the definite integral with the target class. 




* 


Ontario 0% USA 0% 






The responses indicate that while teachers in both terms of area under the graph of a function and as the 
jurisdic tions discuss and interpret the definite integral, work done by a variable force. Teachers universally 
only halfofthe Canadians use a foimal definition based used the first interpretation while only a single Ameri- 
on the idea of a Riemann sum. The Canadian teachers can teacher used the second, 
either teach the definite integral very informally or 

they usea different definition. The majority of Ameri- The last questionnaire item dealing with the con- 
can teachers employ this definition to evaluate specific cept of integration asked teachers how they dealt with 
integrals. Teachers were asked in another question^ the sequence of introducing the definite integral and 
naire item if they interpreted the definite integral in 
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the notion of the anti-derivative. The vast majority, 84 
percent of the Canadian teachers and 91 percent of the 
American teachers, indicated that they introduced the 
anti-derivative first. Ten and five percent respectively 
reversed the sequence. The remaining teachers, about 
five percent in both cases, did not teach the concept of 



the anti-derivative. 

Major Theorems: The questionnaire probed the 
presentation of theorems through a set of choices 
dealing wirfi three major theorems, the teachers' re- 
sponses are displayed in Figure 4. 




] Presented with justification 



Presented with proof 
(Rematnder did not present) 



Figure 4. Presentation of three Important theorems. 
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A marked difference between the high school 
calculus courses in the United States and Ontario 
occurs for these three theorems. While these three 
theorems were universally part of the American curric- 
ula, this was not the case in Ontario. None of the 
Canadian teachers included the Intermediate Value 
Theorem in their courses, and only ten per cent in- 
cluded the Mean Value Theorem. Sixteen percent of 
Canadian teachers did not present the Fundamental 
Theorem of Calculus to their students. It is most 
surprising that this theorem would ever be omitted in a 
calculus course which, as these did, includes both 
differentiation and integration 



The Learning of Calculus 

The learning of calculus at the secondary school 
level in North America is examined using achievement 
results on 25 of the test items that were used in the 
Second International Mathematics Study (SIMS) for 
the 62 Canadian and 44 American classes discussed 
above. Of the 25 items, 13 dealt with differentiation 
topics while 12 dealt with integration topics. Theitems 
will be referred to here using their SIMS code numbers. 
Based on the teachers' reports, all but two of these 
items, numbers 73 and 122, can be considered part of 
the curricula of virtually all 106 classes. These 
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classes. These items test basic material that does not 
extend beyond the first semester of a typical college 
course in calculus. 



levels for the items and for groups of items for each of 
the six jurisdictions arc shown in Table 2. 



Evaluating achievement results is a complex and 
often contentiais matter. What might appear to be a 
satisfactory level of performance to one observer might 
be judged quite inadequate by another. In the final 
analysis eacli reader must render his or her own in* 
formed judgement. To assist in the process compara- 
tive achievement data is provided for several other 
jurisdictions which participated in SIMS: viz, England 
and Wales, Japan, New Zealand, and Sweden. In Japan, 
New Zealand, and Sweden approximately the same 
percent of the age cohort were enrolled in a course in 
which calculus was studied, 11 or 12 percent in each 
case. For England and Wales and the United States the 
corresponding figures were six and five percent, respec' 
tively. For Ontario Grade 13 the figure was 19 percent. 
Thus, the English and American groups are the most 
elite which the Canadian group is the least elite. This, 
obviously, has some relevance on the level of perform- 
ance to be expected from each group. Achievement 



In Table 2 the differentiation and integration 
items have been divided into two groups. The items 
grouped together as Differenrudon 1 and Integration 1 
deal with basic techniques while the items grouped 
together in the other two categories deal with simple 
applications. Considering the entire test, theperform* 
ance of the Canadian students is the lowest of all of the 
jurisdictions surveyed. The Americans performed at a 
level comparable to students in Sweden and New 
Zealand, but substantially below students in England 
and Wales and in Japan. 



Table 2 

Outcomes for 25 Calculus Test Items 
(Percent Correct) 




Item or Group 


United 


Canada 


Japan 


Sweden 


Liigland 


New 


States 


(Ontario) 




& Wales 


Zealand 


14 


82 


79 


77 


72 


77 


85 


72 


62 


63 


56 


52 


70 


58 


106 


74 


63 


73 


57 


80 


84 


118 


61 


41 


62 


34 


63 


43 


Differentiation 1 


70 


62 


67 


54 


73 


68 


28 


57 


56 


6? 


25 


50 


43 


57 


24 


22 


54 


16 


39 


12 


88 


54 


60 


54 


44 


68 


65 


104 


55 


46 


74 


59 


59 


50 


111 


29 


30 


56 


43 


41 


29 


112 


47 


35 


62 


47 


49 


32 


117 


54 


58 


86 


59 


80 


58 


119 


49 


42 


60 


39 


48 


48 


122 


25 


20 


56 


27 


31 


37 


Differentiation 2 


44 


41 


63 


40 


52 


42 


15 


85 


74 


83 


71 


88 


83 


73 


19 


21 


51 


28 


47 


25 


103 


71 


58 


80 


78 


81 


C7 


107 


80 


75 


74 


53 


86 


74 


109 


20 


24 


67 


56 


35 


14 


113 


69 


56 


73 


75 


78 


60 


114 


39 


28 


51 


25 


38 


32 


116 


39 


33 


41 


28 


46 


3^ 


Integration 1 


52 


46 


56 


52 


62 


49 


29 


60 


54 


81 


74 


67 


70 


44 


11 


25 


58 


59 


46 


47 


58 


32 


38 


66 


43 


41 


26 


115 


41 


26 


55 


48 


40 


44 


Integration 2 


40 


36 


65 


56 


49 


47 


All Items 


50 


45 


63 


48 


58 


49 
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Achievement in Advanced Placement classes 

Due to the rather lackluster achievement of the 44 
American calculus classes on the calculus items, it is of 
interest to know if the 26 classes that were preparing to 
take the College Board Advanced Placement (AP) 
exam did any better than the other 1 8 classes. Williams 
(1986) addressed this question by analyzing the results 
on 46 of the test items. His subtest included some pre- 
calculus items as v/ell as some calculus items that were 
not included in theaboveanalysis. Since, in the United 
States, both a pretest and posttest were given, it was 
possible to determine if there was a different - between 
the two groups at the beginning of the calculus course. 
Williams' analysis showed that no statistically signifi- 
cant difference existed in pretest scores. By the end of 
year, however, his analysis showed that the A P classes 
had learned more. They scored an average of 14, 8, 5 
and 7 percent higher than the non-AP cWjics on limits 
arH tinuity, differentiation, anr^ applications of 
• adon subtests, and the totjl analysis test, re- 

> ly. These results were sipnificant at the 0.05 
level indicating a high level of probability that AP 
classes outperformed non-AP classes in the United 
States r i these topics. The AP c! ^sscs also did better, 
but not at a statistically significr,nt level, on basic 
integration. The AP classes did not do better than the 
non-AP classes on applications of integration. 

Williams could not explain, on the basis of his data, 
why the AP classes outperformed the non-AP classes. 
This result is, however, consistent with other research 
that has shown that AP high school students in the 
United States tend to achieve as much or more than 
American university students studying calculus (Haag, 
1977 and Dickey, 1982). 



Summary and Conclusions 

There seem to be major differences in the calculus 
courses as offered in Ontario and in the United States. 
Many of these differences probably stem from the differ- 
ences in basic texts being used. In the United States 
college level calculus text predominate whiie in On- 
tario the texts used most frequently '-ave apparently 
been written specifically for the secondary level. 

The American classroom curriculum is generally 
more extensive at both thebeginningand theendoftlie 
course. Much more emphasis is put on the foundation 
areas of limitsand continuity, with epsilon-delca defini- 
tions and proofs playing a key role in classroom pre- 
sentations as well as in student lee ning objectives in 
the United States. The Intermediate Value and Mean 
Value Theorems are studied in d^e United States but 
not in Ontario. Finally, such topics as arc lengdi, 
integration by parts, indeterminate fomis, and volume 



of surfaces or revolution are taught more often in 
American high school calculus courses than in such 
courses in Ontario. 

American teachers whv. took part in this study em- 
phasized the foundation areas more often and usually 
expected their students to be able to do epsilon-delra 
proofs. They did not, however, justify the formulas 
which they presented later in theircourse as often as the 
Canadians did. Canadian teachers were much moreapt 
to give their students a formal proof of the chain rule 
than the American teachers, for example. There was 
also a closer relationship in the Ontario classrooms 
between the level of rigor in teacher presentarions and 
the level expec ted of students for these formulas. 

The use of graphs was employed by the majority of 
teachers to supplement algebraic prcsenrations. How- 
ever, while one would expect all teachers to use graphs 
in discussing the 1 imit of a function at a point and in 
presenting epsilon-delra arguments, a large number of 
teachers in both jurisdictions chose not to do ^o. 

The calculator was the one instructional aid, in 
addition to the basic text, used by the vast majority of 
all teachers. Onlya relatively small number of teachers 
used movies, models, or computers. Applications of 
calculus were almost always drawn from the context of 
the physical sciences and were taken from textbooks or 
created by the teacher. 

The ach-'evement of calculus students in both the 
United States and Ontario is probably less than satis- 
factory overall and certainly so for those items dealing 
with applica ionsofbasicconcepts. American students 
tended to outperform their Canadian counterparts 
scoring Tive percent higher on the 25-item test. All 
sev jn of the items in which one jurisdiction outscored 
the other by over ten percentage points were in favor 
of the United States. 

A key element in assessing these achievement 
icsults is the percent of the age cohort served by these 
high school calculus classes within each jurisdiction. 
With this in mind, the overall results achieved in 
Ontario's Grade 13 classes tend to look better while 
those in the United States tend to look worse. In 
Ontario about 1 9 percent of the age cohort are enrolled 
in Grade 13 calculus. This is comparatively a very high 
pv centage and can be used to justify somewhat lower 
achievement results than might be otherwise consid- 
ered satisfactory. Achievement on the integration 
items in Onrario still must be coasidered poor, how- 
ever. 

The American achievement results must cause a 
good deal of concern when the very elite nature of the 
high school calculus population is considered as well as 
the high degree of appropnatenessof the 25 items to the 



basic course content as reported by the teachers. 
American results were only slightly above those ob- 
tained in Sweden and New Zealand where the percent 
of the age cohort enrolled in calculus is twice as large as 
in the United States. The results were considerably 
below those obtained in England and Wales, and in 
Japan. Only in the former case is the population about 
as small a percent of the age cohort as in the United 
States. Certainly, these results should cause American 
mathematics educators to reflect on the expectations 
which exist for calculusinstruction in American schools 
as well as the adequacy of the prccalculus instruction 
wh:ch students are currently receiving. The situation 
appears less problematic where Advanced Placement 
programs are in place, however. 
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PARTICIPATION AND OPPORTUNITY TO LEARN AS FUNCTIONS OF 
STRUCTURAL & ORGANIZATIONAL FACTORS OF SCHOOL SYSTEMS 
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When international comparisons arc made using 
data from the International Association for the Evalu- 
ation of Educational Achievement (lEA) studies, the 
focus most often is on results of the achievement tests. 
There isa general interest in knowing which systems do 
best and which not so well when comparing test scores 
which, presumably, reflect more or less knowledge and 
skills in a particular content ".rea. 

There are, obviously, other ways to compare these 
systems. One such way, and the theme of this paper, is 
to look at how policies and practices of different educa- 
tional systems distribute opportunities to their stu- 
dents. The results of the Second lEA Mathematics 
Study (SIMS) are particularly appropriate in this regard 
since mathematics is perceived as such a crucial subject 
area in virtually all systems. 

What types of students are given what types of 
opportunities within these varied educational settings 
is the focus of this paper. Questions of when and how 
studentsare selected into different curricula are consid- 
ered paramount, since that selection determines the 
kind and amount of mathematics to which a student 
will be exposed to. 

The Samples 

TwO groups of students, one of students 13 years 
oldand asecond that included students in the final year 
of secondary school enrolled in advanced, university- 
preparatory mathematics courses, were targeted for the 
study. Those samples were justified on tut following 
basis: Population A (students 13 yearsold and typically 
in grade eight) was chosen because it may be and often 
is the last time that all students are taking mathemat- 
ics. Hence, grade eight represents the point where the 
minimum amount of mathematics is given by a system 
to all its students. The second group, Population B, is 
a sample of those students who theoretically have 
received the most mathematics that a system delivers 
prior to university or tertiary education. In the SIMS 
study, these students are the mathematics specialists in 
the secondary schools of these educational systems. 

This paper examines results from both popula- 
tions. Who participates in what kinds of mathematics 
available at grade eight serves as an indicator of how 
opportunities are distributed within an educational 
system when each student is taking mathematics. How 
much mathematics is given to how many students and 
to what kinds in Population B are of interest because 
they reflect the importance of mathematics to that 
Q system. 
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Results discussed in this paper comr " om a subset 
of systems that participated in the survey, llie reason 
for not using results of each system is that the surveys 
were implemented differently from system to system. 
Crucial to a discussion of Population A results are 
measures taken both early in the school year (a pretest) 
and at the end (posttest). Only eight systems imple- 
mented studies with those features. For the Population 
B section of the paper, two systems (Hungary and the 
United States) are featured. These two were chosen 
because they approach differendy both the retention of 
students in school and exposure of stude nts to advanced 
mathematical knowledge. 

The Symbolic Importance of Mathemadcs. 

lEA's second mathematics study wasfirst a study of 
mathematics — issues of curriculum, students* achieve- 
ment, and pedagogy were emphasized — but because of 
theplace ofmathematicsinschools, it could notbeonly 
that. With increasing demandsfor technical expertise 
coming from the broader social arena, greater weight is 
placed on mathematical skills and talents than on out- 
comes from exposure to other traditional content areas. 
In order to be in greater demand in the job market, or 
to qualif/ for more prestigious higher education, a 
student must navigate the best mathematics a system 
has to offer. Since theschool hasa virtual monopoly on 
such training, the demands on students and schools are 
obvious. If success in mathematics is a key to later 
success, and schools determine who gets what mathe- 
matics, then mathematics becomes a symbol of modem 
schooling. 

Variation in Tracking Policies - What Mathematics 
for Which Students? 

Because there was, ineightof the systems, a pretest 
administered at the beginning of the school year, it is 
possible to describe the allocation of students to class- 
roomsand schools. Average scores between classrooms 
within schoolsand between schools refiectpolicies that 
are adopted in terms of whether or not students are 
sorted and tracked into more or less homogeneous 
classrooms or schools. 

If, for instance, students were assigned randomly to 
classrooms (orsystemadcally assigned to classrooms to 
insure heterogeneity) within a school, one would ex- 
pect the average pretest score for each class of students 
to be about equal. If students were assigned to class- 
rooms according to achievement levels in previous 
grades or on the basis of an apritude test, and if the 
higher scorers were assigned to one class and lower 



scorers to another, one would expect average scores to 
vary greatly between classrooms in the same school. 
Similarly, if students attended schools on a random 
basis (or were assigned to schools to promote equity), 
school averages would be about equal. If, however, 
there were selection according to prior achievement or 
if students whose background characteristics correlated 
with achievement were clustered in separate schools, 
then one would expect substantial differences between 
school means. 



to both systematic allocation of students to classrooms 
(deppite a provincial policy to the contrary) and differ- 
ent demographic characteristics of the schools. The 
patterns in France and Ontario show minor differences 
both between classrooms and between schools, and in 
neither case are they of the magnitude of classroom 
difference- in the United Statesor school differences in 
Belgium* .emish. Itappears, therefore, thatneitherthe 
Trench nor the Ontario schools have yet begun to sort 
according to measures of achievement or aptitude. 
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Figure L Variance decomposition of Population A Pretest. 
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Figure 1 contains the results of a variance decom- 
position of the Core pretest in each of the eight "longi- 
tudinal" systems. In that figure, areas of the circles are 
roughly proportional to the total variance in achieve- 
ment for each system. The wedges within the circles 
represent percentages of total variation that is found 
between students, classrooms, and schools. Those 
circles which contain only two wedges depict systems 
that did not sample two classrooms per school. In those 
cases, the variation is labelled student and school vari- 
ation although theoretically the wedge for school con- 
tains both the classroom and school variance. That is, 
the between classroom variation, if an/ exists, is a part 
of the between school variation not the between stu- 
dent. 

Issues of exposure to instruction and participation 
in mathematics in Population A are tied to policies of 
how students are allocated to classrooms or school. If 
there is a common curriculum and no attempt is made 
to place particular students in special classes, then 
parti ci pa ti on and exposure a re more or less com mon for 
each student. If there is some kind of tracking within 
the system, then questions can be asked about whether 
decisions to track are related to the kinds of mathemat- 
ics experience students are given. 

The differences between systems are dramatic. 
Not only are the total variances (individual differences 
within a system ) of strikingly different magnitudes, but 
also that variation is divided (how individual differ- 
ences are responded to) in distinct ways. In japan, for 
instance, almost all of the large total variation is be- 
tween students. Since there is such a small amount of 
between school variation, variation between classes in 
the same school must likewise be small. 

One can infer that the Japanese either ignore indi- 
vidual differences when assigning students to class- 
rooms, or they implement policies that produce equal- 
ity among classrooms and schools. There is no homo- 
geneous grouping in mathematics in Japanese schools 
at this grade level and thereappears to be no sorting by 
school. 

In bold contrast to the Japanese pattern of vari- 
ation stands that of the United States. The magnitude 
of the between classroom component in the latter is its 
single largest component and exceeds comparable val- 
ues in all of the other systems. 



Other systems, too, havedistinctivepattems. New 
Zealand, despite the fact that it purports to have a 
national curriculum, reflects a pattern very similar to 
the United States. The between school differences in 
Belgium Flemish are a reflection, one assumes, of the 
fact that there are different school types (vocational, 
general and technical) and different organizing au- 
thorities for students at this grade level. The between 
school differences in Thailand can be hypothesized to 
reveal differences between urban and rural schools, 
while those in British (Dolumhia apparendy are related 

The Belgium Flemish, New 2Jeabnd and United 
States of America Cases* 

It is obvious from Figure 1 that different systems 
have different policies insofar as the allocation of stu- 
dents to classrooms or schools is concerned. In another 
paper (K^^'^r, in press), I have done detailed analyses of 
the nature and consequences of such policies in Bel- 
gium Flemish, New Zealand and the United States. 
Here I will highlight those findings ra ther than portray 
them in detail. 

Different Types of Tracking Have Different 
Consequences 

Belgium Flemish at this grade level has different 
types of schools for its pupils, and those pupils are 
exposed to different amounts of mathematics. The 
United States has different types of mathematics class- 
rooms within each school, and in those classrooms 
Siudents are exposed to radically different types of 
mathematics. Ir New Zealand schools, students are 
sorted into classrooms by, apparendy, measures of pre- 
vious achievement and then given either more or less 
mathematical content. 

The most dramatic example of how tracking poli- 
cies influence what mathematical content students are 
exposed to comes from the United States. Figure 2 is a 
series of Box and Whisker Plots which describe, by four 
distinct classroom types in the United States, teachers' 
ratings of die Opportunity to Learn (OTL) the mathe- 
matics reflected by the SIMS achievement test. OTL 
was gathered by asking each teacher to look at each test 
item and decide whether or not the material needed to 
answer the question correctly had been taught to the 
students. 
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Figure 2- OTL by content arcst/qr United States of America class types. 
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Itisevidentfrom Figrrc 2 that sorting students and 
differentiating the cumculum are two sides of the same 
coin ThoJ5e students, for example, who are in Algebra 
classes (the high track in the United States) are ex- 
posed to very different kinds of material than those in 
other types of classrooms. 

Though not nearly so dramatic as what is found in 
the United States, tracking of students leads to differ- 
ent types of exposure to mathematics in both Belgium 
Flemish and New Zealand as well. Tliose differences, 
however, are both smaller in magnitude and of a differ- 



ent kind. In those two systems, students in "better" 
tracks tend to be exposed to more iiiathcmatics. 

The Sorting Is Inefficient 

In each of these three systems it can be assumed 
that procedures used to allocate students to classrooms 
are meant to be '•ational and efficient. The analyses 
suggests, however, that if these systems are operating 
meritocratically — that is, it is desired that the best 
students be in the highest tracks and vice versa — they 
are not doing very well. 
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Q Figure 3« Distribution of pretest scores by school type in Belgium Flemish. 
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Figure3 shows the distributions ofpretest scores by 
school type in Belgium Flemish. What is worthy of note 
is that a substantial number of students in vocational 
and technical schools have pretest scores on the SIMS 
test that are above the average for the traditional and 
comprehensive schools. Hence, if the selection were 
done by the system (as opposed to individual choice) 
and based on merit, quite a number of students have 
been mis-classified. 

In the United States, of those students in the top 
10 percent of the distribution of pretest arithmetic 
scores, only one-half are placed in algebra classrooms. 
Of the students in the top quarter, slightly less than 
one-third were in algebra. 

For New Zealand, students who appeared to be 
high scorers in one school would be placed among the 
low scorers in another. Hence, they too were makinga 
substantial number of classification errors if merit or 
prior performance were the means for students to get 
preferable curricular experiences. 

The Tracking Has Social Consequences 

Not only is tracking inefficient and error prone but 
the practice also has social coixsequences. Analyses 
(Kifer, 1984) of whether the ^ were background char- 
acteristics of students related to participation in the 
various tracks indicated social biases in that allocation. 

Figure 4 depicts the relationship of Father's and 
Mother's educational levels and whether a student was 
in a high or low scoring classi oom in New Zealand. The 
high scoring classrooms had a disproportionate number 
of students whose fathers or mothers were highly edu- 
cated. Conversely, low scoring classrooms were dispro- 
portionately populated with students whose fathers and 
mothers had lower levels of education. In the United 



States, students who are white, female, and come from 
wealthy homes are placed in the favored tracks. Those 
who are not white, are boys, and are not wealthy are 
more likely to be placed, regardless of test score, in the 
lower level classes. Class and gender effects are present 
in Belgium Flemish but to a much lesser degree than 
what is found in either the United States or New 
Zealand. 

The Cases of France, Japan and Ontario. 

It is not the case that some systems track or sort 
students and ethers do not. It is a matter of when the 
sordng occurs not if it will occur. Yet, the systems of 
France, japan, and Ontario have in place, apparently, 
policies which attempt to insure that virtually all stu- 
dents are exposed to common material at the Popula- 
tion A level. 

Remembering that this population was chosen 
because in most systems it is the grade level where all 
students still take mathematics, these three systems 
have chosen to make the educational experiences oi 
the young common ones in mathematics. Later, each 
will sort. 

Thisegalitarianapproach to mathematics inFrance 
is a result of national chanees instituted in the educa- 
tional system during the late 1960's. Concerns were 
expressed at that time about the lack of common oppor- 
tunities available to students of this age cohort. Selec- 
tion into types of curricula occur in France during the 
upper secondary school, i-ather than during this rela- 
tively early period of a student's school life. These 
results suggest that the new system gives more students 
a more equal cnance of going in the most desirable 
educational route by guaranteeinjj equal opportunities 
through the elementary school years. 
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Figure 4» Percent of students by classroom type and educational level of parents in New Zealand. 
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For japan, whose sample is one grade level earlier 
than others in this set, entrance into upper secondary 
school is the demarcation of the change from common 
opportunities to differentiated ones. These decisions 
— which students enter which t>^e of schools — occur 
about three years later than this grade level and are 
based primarily on entrance examinations. 

For the Ontario system, sorting occurs at the next 
grade level. As students enter the secondary school, a 
number of different types of measures are used to 
determinewhich curricula they will participate in. The 
extent to in which background characteristics of stu- 
dents are related to participation in the most favored 
curricula obviously cannot be addressed with these 
data. A data set for the subsequent year would be 
needed to address these problems. 

Summary 

Population A systems provide a contrast between 
those which have more or less egalitarian policies 
(France, Japan, Ontario) versus those with merito- 
cratic one (Belgium Flemish, New Zealand, USA). 
Which is the preferred set? Some would argue that the 
merits of egalitarian versus meritocratic educational 
practices should be found indifferences in achievement 
rather than differences i n opportuni ties or in equal ity of 
participation. Previous lEA studies suggest that com- 
prehensive schools do not negatively affect the per- 
formance of the most talented. And, selective schools 
do not necessarily enhance the performance of those 
who are enrolled there. Such analyses, however, have 
been based on older populations of students and may or 
may not be appropriate in this context. 

There islittle, if any, direct evidence of theefficacy 
in terms of achievement of either the egalitarian or 
meritocratic approaches and practices among the par- 
ticipating systems. Since these are national systems 
and this is a sample survey , variables which may operate 
to produce high or low performance and which distin- 
guish between the systems or the contexts in which 
they operate, arc simply not available. It would, there- 
fore, take an extremely strong inference to state that in 
terms of cognitive achievement, as measured by the 
SIMS test, therc is a decided advantage of one set of 
practices over the others. 

Nevertheless, therc may be fir dings and indirect 
evidence within the study that would allow one to 
prefer the practices of the egalitarian systems — Can- 
ada Ontario, France and Japan — over the others. First, 
Population A studentsinbotli Franceandjapanscored 
well on the cognitive tests and showed rather rcmark- 
able gains on subsets of the items. And, in previous 
analyses, it was shown that Canada Ontario, which is 
both comparable in terms of variance and achievement 
to the United States, showsslighdy greatergrowth than 
docs the United States. I n addition, the patterns of gain 



forthetwosystemsarc very similar. Hence, straightfor- 
ward comparisons, though arguably weak by nature of 
the design of th e survey, show superiori ty on the part of 
egalitarian practices. 

Logic, too, supports these egalitarian policies and 
practices. If a system wishes to select the most talented 
students and provide them with the best educational 
opportunities, then the longer that the selection is put 
off, the better it will be. 

Thesortin0ofUnitedStatesstudents,forinstance, 
starts much earlier than the Population A grade level 
where the tracks are firmly in place. If a mistake in 
selection is made prior to grade eight, the child's school 
career is obviously affected. And, there are no system- 
atic ways, even if the child has the required talent, to 
rectify the mistake. The child could be very good but 
still be in a low track because early in his or her career 
anerrorwas made. If, however, there were no tracking 
or selection in the United States, and there were no 
concomitant differentiation of curriculum, no opportu- 
nities wouldhave been thrown away. Hence,the longer 
a system waits to sort the more likely it is to have a 
developed (in the talent sense) an identifiable cohort 
on which tosort. Since these three systems — Canada, 
Ontario, France and Japan — have not yetsorted, there 
Practices are preferred to those of other systems because 
they up until now have made fewer errors in the 
selection process. 

Population B 

As will be shown later policies adopted at the 
Population A level influence profoundly what can be 
done at Population B. Yet, the issues of participation 
and exposure to mathematics content are different for 
the two populations. 

Virtually all students are taking some type of mathe- 
matics in the 13 year old population; by the end of 
secondary school, depending on the system, either a 
large proportion of the cohort is no longer enrolled in 
school or not taking mathematics or both. Figure 5 
shows those proportions. The estimated percent of the 
cohort still in school rangesfrom ahigh value of over90 
in Japan to a low of 17 in New Zealand and England. 
The percent of the cohort taking advanced mathemat- 
ics courses ranjjes from a high of 50 percent in Hungary 
to lows of 6 percent in Israel and New Zealand. 
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Figure 5* Participation in schooling and advanced mathematics: Population B* 
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Th^ United States has a relatively high rate of 
retention (82 percent which is second to japan) and is 
in the middle in terms of the proportion of the cohort 
taking advanced mathematics. 

Across these systems, two phenomena are evident. 
First, therehas been a selection made across the student 
cohort. That is, not all students progress through these 
system^ until ihey reach the terminal year of secondary 
schooling. Depending on the system, student attrition 
can be a matter of dropping out of school and entering 
the job market or it can be that there is another, earlier, 
school leaving point where the majority of students get 
a certificate and leave school having completed die 
required number of years. In the lauer systems, a 
minority of the cohort continues secondary school in 
order to prepare for university. Second, among the 
students who remain in school, it is a fraction of diem, 
in most systems, who are taking the most advanced 
mathematics offered by the secondary school. Also, 
among these students there is a possibility that they are 
taking mathematics, butnotatdiehighest level. These 
educational systems vary dramatically in the policies 
diat determine which students remain in school and, of 
those, which continr.e to take advanced marfiematics. 
The section below fcxuses on two extreme cases of 
dealing with these issues. 

The Hungarian Example 

While most systems are very selective at this level, 
a striking exception is the Hungarian case. While 
having "only" 50 percent of die cohort still in school, 
all of those arc taking advanced mathematics. This 



finding suggests that very different policies inform the 
mathematics community in this system. One conjec- 
ture would be diat the Hungarians do not believe they 
can afford to have mathematics be an elirist content 
area. Mathematical knowledge is sufficiently impor- 
tant to be a part of each student's experience at rfiis 
level. 

Miller and Linn (1985) examined achievement 
patterns in light of the different retention rates in diese 
systems. They report two things that are relevant to the 
Hungarian system and this paper. The first is that the 
average level of achievement for Hungary's students is 
close to the bottom among the systems; the second is 
that the top 1 percent and top 5 percent of Hungarian 
students perform near the top of the distribution of 
scores for these systems. From an international per- 
spective it appears that die Hungarian experience 
allows them to have it ^^bodi ways." Not only are they 
providing advanced mathematical experiences to a 
large percentage of die cohort, and thereby increasing 
dramarically the sum of mathematical knowledge in 
theculture, but also they aredoingit without sacrificing 
the talents of their most capable students. As a model 
for providing both opportunity and creating a pool of 
talent, Hungary's bears scrutiny. 

The Case of dieUnitcd States 

The situation in die United States is practically 
the opposite of the Hungarian one. In the United 
States there is a high retention rate but a modest 
percentage of students taking advanced mathematics. 
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The latter count, however, is misleading. There 
are, in fact, differentiated curricula at this level as well. 
Figure 6 shows how the United States stands when 
content areas are broken broadly into calculus and 
other courses. The results suggest that there is a rather 
smaller cohort in the United States than in other 
svstems. Since calculus is standard fare for these other 
systems, the United States percentage is really much 
lower than it appears. The calculus courses are further 
differentiated between those that are considered Ad' 
vanccd Placement and others. The numbers of stu- 
dents who are enrolled in Advanced Placement Calcu' 
lus is extraordinarily small; it is estimated that it is less 
than one percent of the cohort. So the American elite 
is very small and a far smaller proportion of students in 
the United States is receiving mathematics experience 
comparable to that of students in other systems. 

Conclusions 

Systems which track students early profoundly af' 
feet the chances of many students being exposed to 
much of the mathematics that is offered to students in 
other educational systems that put off tracking until 
later. By grade eight in the United States, for example, 
less than 15 percent of the students are in a track that 
willallowthem to take calculus in Grade 12. Bygrade 
twelve another 10 percent or more (of the cohort) has 
dropped out so that there is litde participation in 
advanced mathematics in the United States compared 
to that in other countries. 

Not only isparticipation in advanced mathematics 
low in the United States, but combined with the Popu- 
lation A findings, there is a serious question of whether 
the most talented studentsare enrolled in t^.e advanced 
mathematics courses. If one half of the top ten percent 
of students are taking courses in grade eight which 
allow them to take the most advanced mathematics in 
grade twelve, it is conceivable (though not proven) 
that the students who do take the most ma thematics a re 
not the best ones. The best ones may have been 
O „ selected out by errors of early tracking. 
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The Hungarian system shows anotherapproach to 
educating students mathematically. Although its re- 
tention rate is lower than most other systems (50 
percent of the cohort still in school), since it does not 
differentiate its mathematics curriculum, it has much 
higher participation thanothersystems. Apparently in 
Hungary mathematics is considered important enough 
to be offered to a large percentage of the cohort. 

Selection Effects 

The fact that early tracking differentially affects 
the genders, persons from different social classes, and 
different ethnic groups raises additional issues. Two 
not so easily answered questions are raised by these dif- 
ferential participation rates. Thefirsthas tcdo with tSe 
issue of equity in general. Talented students who are 
poor and from minority backgrounds are being ex- 
eluded from fullest pp^-ticipation in school mathemat- 
ics. This loss of human resources has implications for 
the knowledge of mathematics that informs a culture, 
but also raises moral issues. 

The second issue is what to do about the first. 
SIMS provides results that identify t^ e problem but, as 
is the case for trany such projects, does not provide a 
basistor solving it. Because it isanintemationalsurvey, 
and because these systems are quite varied in terms of 
their policies, there are different models available to 
those who wish to change how students are educated 
mathematically. 

The Problem is Participation 

It is interesting to note that by Grade 8 in the 
United States enough sorting of students has occurred 
so that the percentage of students taking -^Igebra is 
about equal to the percentages that take the most 
ad.'anced mathematics offered at Grade 12 by educa- 
tional systems in other countries. The tracking there is 
so rigorous that, in fact, it is assured that participation 
in advanced mathematics in going to be small in 
secondary schools. But these other systems are sclec- 
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tive as well. To have but 10-15 percc ^ a cohort 
experiencing the best a school system \ .o offer in 
mathematics is by no means exceptional. Is not good 
mathematics more important than to be offered to such 
a limited number of students? It appears to this writer, 
that participation in the best a school has to offer is a 
major issue for each of every one of these systems. 
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CONTENT REPRESENTATION IN MATHEMATICS 
INSTRUCTION: A CASE STUDY OP THREE COUNTRIES 

Curtis C. McKnight • Thomas /. Cooney 



A characteristic feature of mathematics in- 
stnjction is that its mathematical content can be rep* 
resented in a variety of forms. These forms often differ 
widely in their complexity. Further, they differ in the 
easewith which they maybe comprehended and in the 
connections that may be made to existing cognitive 
structures of learners. 

For instance, when teaching the concept of 
common fractions, teachers can interpret such frac- 
tions, among other ways, as parts of a region compared 
to the whole ofthe region (presented as a figure divided 
intoequal parts with so me parts shadedandothersnot); 
as a division of integers; as related to physical measure- 
ments such as length, area or volume; or as a corre- 
sponding fr?ction in decimal form. Certainly these 
various rep* mentations for the fraction concept would 
ha ve di ffering references to s true tu res of ex is ti ng kno w 1- 
edge for various learners. These representations vary in 
the degree to which they rely on more perceptual, 
iconic elements or on more abstract, symbolic ele- 
ments. These representations are thus likely to be 
processed quite differently by different learners. 

An essential element of the pedagogical task in 
mathematics is, then, the choice of one or more repre- 
sentations for the content to be taught, whether this 
decision is made by the teacher direcdy, by a group 
creating a curriculum guide, or by the authors of a 
textbook. In any case, the teacher is the final arbiter of 
the pedagogy used and has the possibility of choosing 
content representations to supplement or replace those 
received from other sources. 

The Second International Mathematics Study's 
(SIMS) questionnaires on classroom processes for spe- 
cific content areas yielded rich, detailed descriptions of 
the instruction provided for selected topics in the areas 
surveyed. Thedescriptive wealth of thedata from these 
qM'^stionnai res offers the potential for casting consider- 
able empirical light on questions about content repre- 
sentation in mathematics instruction. 

The authors have taken the approach of exam- 
ining "1 ocal" clusters of related i nforma tion for selected 
subtopics (e.g., the concept of common fractions, the 
addition and subtraction of integers, finding the area of 
a parallelogram, etc.), rather than a strategy of looking 
at data at a more "global" level of topics which combine 
several subtopics (such as arithmetic, algebra, mea- 
O „ surement, etc.). Aggregation to more global "topic" 
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levels often involved a confounding of any explanato- 
rily interesting classifications. The results of these 
investigations appear elsewhere. 

There arc many approaches to studying content 
representation strategies as implemented in mathe- 
matics instruction. The most obvious would be to study 
the specific content representations implemented by 
teachers in various educational systems for various 
topics and instructional settings. Such an investigation 
of specifics would be profitable but, used to study a set 
of more than twenty subtopics available in the SIMS 
classroom process data, it would involve examination 
of a complex array of options implemented in an 
equally complex range of instructional settings set in a 
context of inter-classroom comparisons within each 
system investigated and of multi- system comparisons. 
Specificity in the study of content representations is 
obtained at the price of large increases in the com- 
plexity of the phenomena to be explicated. 

It seems reasonable that the likelihood of iden- 
tifying essential structures and relationships in a set of 
phenomena is at best inversely proportional to the 
complexity of those phenomena. If variables that sim- 
plified the phenomena without destroying their essen- 
tial features could be attained, they should increase the 
likelihood of finding significant structural relation- 
ships. 

While this generalizing strategy was adopted for 
the more extensive investigations reported elsewhere, 
it seemed worthwhile to check the assumption of the 
value of this approach by seeking an opportunity to 
analyze at least one small topic area in all its specificity, 
to examine carefully the descriptive power of such a 
concrete approach, and to assess more directly through 
such an example the trade-offs between generality and 
specificity. The present study is an attempt to do this. 

The discussion which follows examines only 
one subtopic — that of common fractions instruction. 
It restricts attention to the educational systems of three 
countries — France, the United States and New Zea' 
land. These systems were chosen because they pro- 
vided clear contrasts in instructional approach for the 
mathematical topic chosen. In France, instruction on 
common fractions is largely delayed to the grade con- 
taining students about age 13 (Population A in SIMS), 
while such content is introduced much earlier in both 
the United States and New Zealand, but in quite 
different ways. Selection of this topic restricted use of 



SIMS data to that for Population A, classrooms at the 
grade level at which the meaian mid-year age was 13. 
These restrictions have made possible a somewhat 
detailed and specific look at how instruction in com- 
mon fractions is carried out by the teachers in these 
three systems. 

Resource and Time Use in Fraction Instruction 

Among the first concerns in instrucrion on any 
mathematical topic is whether the topic is to be treated 
as a new or review topic, how much time is to be 
allocated for instruction on the various aspects of that 
topic, and which resources are used in providing that 
instrucrion. SIMS data is used here to compare France 
(FRA), New Zealand (NZL) and the United States 
(USA) on these components of instrucrion in the 
concepts of andoperarions with common fractions. 

Teachers were asked whether various aspects of 
common fractions instrucrion was taught as new con- 
tent; reviewed and then extended, reviewed only, nei- 
ther taught nor reviewed because it was assumed pre- 
requisite knowledge, or not taught even without such 
an assumprion. Figure 1 presents these data. 



mathemarics curriculum in France). In New Zealand, 
this material was often presented as new content but 
about equally often reviewed and extended. This 
suggests less uniformity in New Zealand's curriculum in 
this area or the existence of two or more streams in the 
curriculum. In the United States, a small proportion of 
classrooms presented this content as new materials 
while most reviewed and extended it or reviewed it 
only. This accords with the fact that there were four 
types of programs identified at this grade level in 
American schools a ndonl y one of them , remedial class- 
rooms, often treated as new content this material which 
had been in the curriculum for some years. 

Just as the three systems differed in whether this 
content was treated as new or review material, they also 
differed in the amounts of rime allocated to it, Figure 
2 presents "box and whisker" plots of the distriburion of 
rime (in minutes) allocated to common fracrions sub- 
topics. In sucn a box and whisker plot, the box runs 
from the 25th percenrile to the 75th percentile, with 
the line inside the box indicating the median. The 
lower "whisker" ends at the 5th percenrile and the 
upper "whisker" ends at the 95th percentile. The box 
thus encloses the middle 50 percent of the distribution 
and the whiskers enclose the middle 90 percent, Figure 
2a presents the total rime indicated for common frac- 
rions instruction while Figure 2b presents the rime for 
the same six aspects of fracrion instruction presentedin 
Figure 1 plus an addirional aspect, rime devoted to 
applicarions and problem solving related to fractions 
(textbook word problems, problems related to real 
world situations, etc.). 
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Figure 1. New vs, Review Instruction for Six 
Subtopics in Three Countries. 
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Figure 1 onv^s that ^or classrooms in France, al- 
most all aspects cJi this material was presented as new Figure 2. Distribution of Time in Minutes Spent on 
content (which accords with narional reports of the Common Fracrion Instruction^ 
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Fro*n Figure 2a it can be seen that the least time 
was allocated to common fractions ia^Jtruction in New 
Zealand and the most in France (where it was essen- 
tially a new topic). While there was considerable uni- 
formity among *hc time allocations in New Zealand, 
there was considerably more diversity in both the 
United States and France. 

Figure 2b shows the spread of time allocations 
received by the seven aspects of fraction instruction. It 
can be seen that the addition and subtraction of frac- 
tions receive;? relatively more attention in all three 
countries. The addition and subtraction of fractions 
also showed the greatest diversity in time allocations, 
followed closely by problems and applications of frac- 
tions. In all cases, France allocated more time than did 
the others and New Zealand allocated less. The pattern 
reflec ted in the overall ti me al locations i n Figure 2a was 
consistently reflected across the seven subtopics of 
Figure 2b. 

Teachers i n the three systemsalso differed in the 
esources used for fractions instruction. The SIMS 
instruments distinguished between primary resources 
(those used frequendy) and secondary resources (those 
used occasionally). Dara were gathered on six cate- 
gories of resources, any of which might be used by an 
individual teacher as either a primary or secondary 
resource. 

The primary resource used by most teachers in 
all three countries was the student textbook. Other 
published textbooks and materials (workbooks, work- 
sheets, etc.) were an important secondary resource in 
all three countries, although they served as a primary re- 
source in only 10 to 20 percent of the classrooms. 
American teachers m?de slightly more use of both 
kinds of text materials than did teachers in France and 
New Zealand. Locally produced text materials were 
also an important secondary resource, and in France 
they were a primary resource for almost half the classes 
(significantly more than in the other two countries). 
By comparison, the other categories of resources (com- 
mercially or locally produced individualized materials; 
commercially or locally produced films, filmstrips, or 
teacher demonstrarion models; and commercially or 
locally produced laboratory materials for student use) 
were litde used. While they served as a secondary 
resource for small percentages of teachers in the United 
States and New Zealand, they were virtually unused in 
France. New Zeahnd made somewhat more use of 
.-boratory mateiials as a secondary resource than did 
the others. 



gathered was whether a pardcular representation was 
emphasized ("used asa primary explanation, referred to 
extensivelyor frequently"), used but not emphasized, or 
no" used at all. 

Forexample, one question gathered data or. the 
use of each of ten content representations for instruc- 
rion on the common fraction concept. These data are 
presented in Figure 3. It can be seen that the represen- 
tarions most frequently used or emphasized in all three 
countries were fractions as decimals, fracrions as quo- 
tients, and fractions as parts of regions. While about 
half of the teachers in all three countries emphasized 
fra< ions as parts of regions, considerably more of the 
teachers in the United States emphasized fractions as 
decimals and fracrions as quotients than did those from 
the other two countries. 
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Figure 3. Representations for Common Fracrions 
Emphasized, Used and Not Used in Instruction on 
the Common Fraction Concept. 



ERIC 



A Look at Content Representation 

One of the more interesting features of the 
SIMS instruments which gathered data on classroom 
processes were questions that examined the use of each 
of an array of content representations during instruc- 
tion for specific subtopics. Part of , le information 



Few other representations received emphasis by 
25 percent or more of the teachers in a country, al- 
though several others were seen to have considerable 
use but not emphasis. Representing fractions as the 
coordinates of points on a number line received a fair 
amount ofemphasis in all threecountries and especially 
in New Zealand. Interpreting fractions as ratios was 
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emphasized by over 30 percent of the teacliers in the 
UnitcdStates, but very li tde in the other two countries. 
Representing fracdon«^ is comparisons was emphasized 
by about25 percent o :eachers in both France and the 
United States but received considerably less e.nphasis 
and use in New Zealand. 

In summary, it appears that there are both 
important commonalties and important differences in 
the content representation patterns for instruction 
related to the common fraction concept. A core of 
three representations were the most often emphasized 
with a few others supplementing this core for at least 
some teachers. Th- range of representationsemphasized 
seems fairly narrow, while die range of representations 
used but not emphasized was considerably wider. 

A second question gathered information on 
interpretations of the addition of fractions. Interpret- 
ing die sum of two fractions as the union of two regions 
was emphasized by about 30 percent of the teachers in 
bodi France and the United States and used by about 
another 40 percent. Interpreting the sum of two frac- 
tions as the sum of two quotients was emphasized by 
over 30 percent of the French teachers and received 
considerable use in all three countries. In comparison 
to interpretations of the common fractions concept, 
very few of die interpretations of adding fr. ctions 
received much emphasis in any of the countries and 
tven the use of various interpretations was relatively 
more restricted. This is at least suggestive diat a richer 
array of content representations are brought into play 
for more conceptual topics than is the case for more 
procedural topics. 

Additional data was gathered about instaiction 
on che addition of fractions. One question sought to 
determine which procedures for addition of fractions 
were emphasized and used in the various educational 
systems. Six procedures were considered in the instru- 
ment — using the least common denominator (LCD) 
in a horizontal format, using the LCD in a vertical 
format, using any common denominator in a horizon- 
tal fomat, using any common denominator in a verti- 
cal fori.iat, using a formula such as 

^ 4. be 
b d bd 

or LSing transformation to and addition of equivalent 
decimals. 

There were some imporC:nt differences among 
die countries. Using the LCD in a horizontal format 
received extensive emphasis in both Fiance and the 
United States but relatively less in New Zealand where, 
instead, using die LCD in a vertical format was empha- 
sized more often (and using a vei rical format with any 
common denominator was used far more often than by 
either of the other two countries). Using decimals was 
O emphasized frequenriy in all direc countries but some- 



what mere often in New Zealand. Thus, there were 
fairly distinctive national patterns in the procedures 
developed for adding fractions, distinctive patterns 
that were less characteristic of the content representa- 
tions chosen. 

Data • 'ere alsogadiered on die techniques used 
by teachers in teaching dieadditionoffractions. Three 
possibilities were considered — presenting only nu- 
merical examples to demonstrate the procedure, using 
numerical examples first and then presenting the pro- 
cedure symbolically ("example dien rule"), or present- 
ing die procedure symbolically and dien illustrating it 
widi numerical examples ("rule dien example"). Pat- 
terns characteristic of the three countries stood out 
quite clearly. Few teachers in any of the countries 
made much use of die "deductive" approach of present- 
ing the general rule and then presenting numerical 
examples. About 75 percent of die teachers in die 
UnitedStatespresentednumerical examples only while 
over 80 percent of die French teachers used die some- 
what more formal approach of presenting numerical 
examples followed by statin^T the general rule or pat- 
tern. The teachers of New Ze*i!and showed somewhat 
more diversity, with just over half presentingnumerical 
examples only and a fair proportion presenting numeri- 
cal examples followed by the general case. 

A Second Look at Content Representaiion 

Clearly there are many approaches to studying 
content representation strategies as implemented in 
mathematics instruction. the most obvious 

approach would be to study the specific content repre- 
sentations implemented, it would involve a bewilder- 
ing combination of complexities when applied to many 
cases. Variables that simplified the phenomena with- 
out destroying their essential features should increase 

the likelihood of finding significantstructural relation- 
ships. 

For instance, the examination of the number of 
content representations used in a given instructional 
setting, rather than the specific representations used, 
offered parsimony and the possibilities of greater gener- 
alizability and explanatory power, but at some risk of 
missing relationships tied to the specifics of the situ- 
ations. Thus, one characteristic of interest was simply 
the number of content representations used by each 
teacher in instruction related to a subtopic. This was 
captured in a variable, VARIETY, a simple coun t of the 
number of the different content represcnrations em- 
phasized or used. 

A second example of interest was the relative 
balance in instruction on a subtopic between represen- 
tations which emphasized in their fc.m more percep- 
tual elements (e.g., shaded regions for interpreting 
fractions) and those which emphasized more symbolic 
forms (e.g., fractions ac divisions). The relative balance 



in instruction on a subtopic between perceptual form 
representations and symbolic form representations, was 
indexed by a variable, BALANCE (OF EMPHASIS), 
which was calculated by taking the proportion of sym- 
bolic emphases used (that is, the number of "symbolic" 
interpietations used, divided by the total number of 
possible symbolic representations on the list for that 
subtopic) minus the proportion of perceptual emphases 
used (that is, the number of more perceptual represent 
tations used divided by the total number of possible 
perceptual representations). BALANCE, defined in 
this way, took numerical values from- 1 through +1. A 
positive value indicated relatively more emphasis on 
the symbolic, a negative value relatively more emphasis 
on the perceptual, and a value close to 0 indicated 
relati vely balanced use of both perceptual and symbolic 
emphases. 

Alternative, more restricted counterparts to 
^/ARIETY and BALANCE could be obtained by the 



same quantifying operations using only those rep* 
resentations which were emphasized and not those 
which were used (but not emphasized). Tnese altema' 
tive definitions might give a very different picture of 
the "heart" of content representation than that pro- 
vided by the more inclusive definitions. 

The data showed that a relatively large number 
of representations (5-8) were used in all three coun* 
tries. The United States showed somewhat greater 
diversity of use. In compaiison, all three countries em- 
phasi-'*-! a far more restrictive set of representations, 
with France showing slighdy greater diversity in repre- 
sentations emphasized. 

A sense of these data can be given by graphing 
the percent of teachers in each country who use the 
various numbers of representations possible (0 to 10 for 
instruction on the common fractions concept). 
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Figure 4 presents such graphs, both for the 
VARIETY of representations used and emphasized. 
From Figure 4 it is clear how much difference there is 
between the VARIETY emphasized and that used, and 
how New Zealand differs from the other two countries. 

\ similar analysis showed both the similarities 
and differences in the VARIETY used and emphasized 
for instruction on common fraction addition. The 
VARIETY of representations used is somewhat more 
"diffuse", i.e., more spread out and less "peaked" for 
instruction on fraction addition in comparison to the 
fraction concept instruction. However, the VARIETY 
of representations emphasized was very restricted for 
fraction addition. For New Zealand and the United 
States, the modal number was zero, i.e., no representa- 
tions were emphasized by over half of the teachers in 



the ^ countries. For France the modal number em- 
phasized 'A^s one. Thus, diere was a marked and 
suggestive difference between instruction for the con- 
ceptual and procedural aspects of this topic. 

Figure 4 showed some of the benefits of abstrac- 
tion in comparison to the more specific data on repre- 
sentations presented in Figure3 . Another way to exam- 
ine this trade-off between specificity and abstraction 
mere directly is to create something like a "power 
curve" for each representation. This is done by plotting 
the percent of teachers in each country using or em- 
phasizing that specific representation for each level of 
the VARIETY variable. Figures 5 and 6 present such 
graphs for the common fraction concept for two of the 
countries. 




Figure 5. Percent Usingand Emphasizing Specific Representations for Common Fractions Concept by Differing 
Q Variety for France, 
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The ten representations available for the com- 
mon fractions concept art categorized into two groups 
— symbolic and perceptual representations. The left 
column of each figure contains curves for the four 
relatively more symbolic representations and the right 
column those for the six relatively more perceptual 
representations. Each graph contains two curves — an 
upper, black curve for VARIETY of representations 
emphasized or used and a lower, gray curve for VARI- 
ETY of representations emphasized. By the nature of 
the case, all teachers with a VARIETY used or empha- 
sized score of 10 have used or emphasized all listed 
representationsand thus each of the upper, black curves 
must end at the maximum of 100 percent. Such is not 
the case for the lower, gray curve. The height of each 
curve and how "early" (how far to the left) it begins to 
climb significandy reveal something of how central 
that representation is to the instruction of a particular 
country on this topic. 

Figure 5 reveals for France that four representa- 
tions constituted something of a core of highly used 
representations. These included the symbolic repre- 
sentations of quotients, decimals, parts of a region and 



points on a number line. In terms of what is empha- 
sized, the gray curves show that quotients and parts of 
a region were the most commonly emphasized repre- 
sentations in the core. 

This core was supplemented by a "shell" of other 
interpretations, including all except fractions as ratios, 
which was virtually never emphasized and used basi- 
cally only by those that reported making use of nine or 
ten representations. Ofthe others, fractionsas compar- 
isons, as measurements and as repeated unit fractions 
showed somewhat greater emphasis than did fractions 
as parts of a collection or as operators. 

The core representations for New Zealand was 
similar to that for France, including the same four as 
before but in addition including relatively high use of 
fractions as repeated unit fractions and as parts of a 
collection. The level of emphasis for these later two 
representations s; j^jested, however, that the core for 
New Zealand is not unlike that of France. The shell of 
supplementary representations v/as also very similar to 
that of France, except that slighdy more use was made 
effractions as ratios and less of fractions as operators. 
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Figure 6 shows that the core representations for 
the United States differed slightly from those of the 
other two. Core representations here included frac- 
tions as quotients , as decimals and as coordinates of 
points on a number line but a much less extensive use 
of fractions as parts of a region. Fractions as ratios 
received sufficient use and emphasis that it might well 
also be considered a core interpretation, in marked 



contrast to France and somewhat to New Zealand. 
Only fractions as operators appear not to be significant 
part of the shell of supplementary representations. 

BALANCE, die orfier variable abstracted from 
the specific representJt tions, offers some hope for being 
even more revealing ? .d for having even more ex- 
planatory power dian does die variable VARIETY. 



(a) DistribuUon of BALANCE in Representations Used 
in Common Fractions Concepts Instruction 
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(a) Distribution of BALANCE in Representations Emphasized 
in Common Fractions Concepts Instruction 
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Figure 7- Distributions of Balance for Representations Emphasized or Used in Common Fractions Concept 
Instruction. 
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Figure 7 presents box-and- whisker plots of the 
distributions if BALANCE both for representations 
used (or emphasized) and for those emphasized only for 
instruction on the common fraction concept. 

BALANCE scores greater than zero indicate 
relatively greater emphasis on the symbolic while scores 
less than zero indicate relatively greater emphasis on 
the perceptual. In Figure 7a the United States shows a 
distribution that is centered on zero and indicates a 
relative balance of emphasis in the representations 
used. By contrast, the other two countries show much 
more of an emphasis on the symbolic. Further inves- 
tigation would be needed to determine just which 
representations provide that symbolic emphasis, but 
the distributions of the BALANCE variableare enough 
to reveal some clear national differences. 

Figure 7b contains the distributions of BAL- 
ANCE for just those representations emphasized and 
presents an interesting contrast to Figure 7a. The 
United States is seen to put relatively more emphasis on 
the symbolic than do the other two countries. Qearly 
there are differences between what is emphasized and 
what is merely used. This suggest that what is empha- 
sized may have greater explanatory potential than 
consideration of what is used. 



The (Studentized) residuals for each class were 
plotted as an indication of whether that class, at end of 
year, performed above orbelow what might be expected 
based on its pretest performance. By looking at the set 
of residuals separately for each country, some indica- 
tion of "overachieving" and "underachieving" coun- 
tries can be seen. 

(A) FRANCE 
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(C) UNITED STATES 



A similar picture emerged from examining 
BALANCE in fraction addition instruction. The 
United States again showed a relatively balanced use by 
a more symbolic emphasis. The pattern for France did 
not differ significantly from that in Figure 7. New 
Zealand both emphasizedand used the perceptual more 
than the other two countries (or perhaps used the sym- 
bolic relatively less. 

A Look at Effectiveness 




O to 10 JO 40 10 40 10 to to l«0 

•Crowe 

Figure 8. Pretest vs. Studentized Residual Class 
Achievement Scores for a Fraction as Part of a 
Regional Concept Item for Three Countries. 



This survey of the descriptive and explanatory 
potential of the SIMS data would not be complete 
without a look at the student achievement data and i ts 
links to the content representation data already dis- 
cussed. Out ofthe pool of about 180 achievement test 
items at the Population A level of SIMS, 1 2 in particu- 
lar dealt with common fractionsconcepts, computations 
and applications. Data for each of these twelve were 
examined and the patterns were similar regardless of 
v'hether the specific item dealt with concepts, compu- 
tations or applications. A few basic points will be made 
here, but restriction to a single case study has limited 
explanatory findings to being suggestive at best. 

The most obvious predictor of end of year per- 
formance on any item for any class is beginning of year 
performance on the same item. With this in mind, 
simple linear regressions were run for each item \/ith 
classes from all three countries pooled. 
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Figure 8 presents these plots for the common 
fractions concept item. Data for the other items were 
similar. It can be seen clearly that France had more 
classrooms with residual (gain) scores above zero than 
below; New Zealand had far mor^, below than above; 
and the United States class residuals were scattered 
fairly evenly and randomly above and below zero. This 
indicates that, in comparison to the other two coun- 
tries, France performed better than expected. New 
Zealand less well than might be expected and the 
United States somewhere in the middle. F .5ults for the 
other achievement test items were similar. It should 
also be noted that the horizontal opread was different 
for France. Since common fractions were essentially 
new content for the target year in France, there were 
very few high pretest scores and thus much room for 
growth. This was not the case for the other countries. 

The possible explanations for these outcomes 
are many. The explanation may be as simple as a 



recency effect, since this material was new content in 
France and largely review content in the other two 
countries. More complicaced explanations may be tied 
to the specifics of characteristic patterns of instruction 
or of teacher belief. Someof the more obvious analyses 
suggest that it may be hard to link achievement effec- 
tiveness to patterns of instructional strategy. For ex- 
ample, plots similar to those of Figure 8 but in this case 
the point for each class was marked to indicate whether 
the teacher emphasized, used, or did not use the »'epre- 
sentation of fractions as parts of a region (the most 
directly relevant representation) showed that in none 
of the three countries were high residual gains consis- 
tently associated with emphasizing that particular inter- 
pretation. The findings were similar for other items 
that could be linked to specific representations. Thus, 
emphasis of a particular representation couid not be 
directly linked to high gains, even on achievement test 
items for which that representation was particularly 
salient. 

Conclusions 

An underlying theme of the woric presented 
here has been to investigate the importance of specific- 
ity in description and explanation as opposed to the use 
of parsimonious and si mplifyi ng abstractions of the data 
(various variables and indices) which offer the poten- 
tial for powerful explanation but at the cost of sacrific- 
ing concreteness and detail and running the risk of 
missed connections. As in al most every th ing else about 
the SIMS classroom process data, the results are mixed. 

Certainly the specifics of description are rich 
and are worthy of further study for identifying impor- 
tant national characteristics. In contrast, in i.-^me cases 
the abstractions revealed patterns that were hard to see 
among the "trees" of the "forest". For instance, the 
VARITTY variable showed considerable difference 
between use of representation in conceptual and proce- 
dural aspects of common fraction instruction and the 
BALANCE variable showed some important charac- 
teristic patterns and some important differences based 
on what was emphasized versus what was merely used. 

Neither strategy is adequate by itself for the 
search for effective description and explanation. The 
investigation of a large array of subtopics by more 
abstract indices has its place in exploring the presence 
or absence of characteristic patterns and general prin- 
ciples. However, such strategies must be supplemented 
by other investigations thatfocus in more detail on the 
specifics of a smaller number of cases to bring out 
characteristics and connections that might be missed 
otherwise. 



TEACHING PRACTICES EMPLOYED IN THE TEACHING OF 
ALGEBRA AND GEOMETRY 



David F. Robitaille 



As part of the longitudinal component of the 
Second International mathematics Study, question* 
naires were administered to participating teachers at 
the Population A (H-year old) level to obtain highly 
specific information about the teaching practices they 
employed in their classrooms. The five questionnaires, 
which were specially developed for use in the interna- 
tional study, dealt with the topics of algebra (integers, 
formulas, and equations); geometr/j fractions; ratio, 
proportion, and percent; and measurement. 

The importance of each of these topics in the 
Population A curriculum varies considerably from one 
;*urisdiction to another, although algebra and geometry 
appear to be constant. That is to say, these two topics 
figure rather largely in the curriculum of each partici' 
pating jurisdiction, although not equally so. By way of 
illustration, Table 1 presents a summary of the percent 
of class time in the Population A year devoted to the 
teaching of the five topics. 

The results displayed inTable 1 for the teaching of 
algebra may be somewhat conservative estimates of the 
actual situation since, on that questionnaire, teachers 
were asked to report how much time they devoted to 
the teaching of integers, formulas, and equations only, 
and not to other algebraic topics which might form part 
of their curriculum. This means that in Belgium (Flem- 
ish) the ♦•'^achers reported that they spent approxi- 
mately 48 percent of the total time devoted to the 
teaching of mathematics in the Population A year to 
the teaching of integers, formulas, and equations. It 
may well be tha t addi tional time was spent deali ng with 



other algebraic topics, but they were not asked about 
those on the questionnaire. 

The caution expressed in the preceding paragraph 
applies to a certain degree to each questionnaire and to 
each jurisdiction in which the questionnaires were 
used. Although a questionnaire bears a certain content 
label, the precise connotation of that label is somewhat 
unclear. The applicability of the Algebra question- 
naire to the French situation is illustrative. 

In the curriculum analysis phase of the Second 
International Mathematics Study, France was categO' 
rized as being one of the jurisdictions which placed a 
heavy emphasis on the teaching of algebra at the 
Population A level; however. Table 1 indicates that 
French teachers stated that only 1 1 percent of class 
time was devoted to the study of topics covered in the 
Algebra questionnaire. This is undoubtedly a matterof 
the definition of the term "algebra"; i.e. what consti' 
tutes algebra in the French curriculum is probably 
different in many important respects from what consti' 
tutes algebra in the questionnaire developed for use in 
this study. That questionnaire dealt with the teaching 
of integers, formulas, and equations. Much of this 
material is treated in earlier grades in France and litde 
or no time is devoted to it during the Population A year. 
We know from the questionnaire that French teachers 
spend about 11 percent of class time on the topics 
covered in the Algebra questionnaire. We do not know 
anything about how much time is spent on other 
algebraic topics. 



Table i 

Time Spent on Questionnaire Topics 
(Percent) 



Topic 


BFL 


CBC 


CON 


FRA 


JPN 


NZE 


THA 


USA 


Algebra 


48 


23 


17 


11 


35 


12 


16 


16 


Geometry 


27 


17 


13 


37 


17 


15 


12 


12 


Fractions 


,* 


16 


14 


20 




12 


14 


17 


Ratio, Prop., 


















Percent 




11 


12 


6 




5 


8 


11 


Measurement 




12 


!4 


3 




8 


9 


8 


TOTAL 


75 


79 


70 


77 


52 


52 


59 


64 



*Questionnairc not used. 

BFL = Belgium (Flemish), CBC = Canada (British Columbia), CON = Canada (Ontario), FRA = France, 
JPN = japan, NZE = New Zealand, THA = Thailand, USA = United States of America. 
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Bearing in mind the limited scope of the curricular 
content covered by the questionnaires and the inher- 
ent limitations of self-report data, it is important to 
recognize the uniqueness and importance of their 
contribution toour knowledge about what transpires in 
mathematics all around the world. The questionnaires 
were designed especially for use in this study, and were 
extensively pilot-tested in several of the participating 
jurisdictions to enhance the validity of the results 
obtained. Li trie is known about what actually tran- 
spires in classrooms, and these questionnaires provided 
a way of obtaining comparative data from a variety of 
jurisdictions on the teachingpractices employed in the 
teaching of mathematics. 

Of the five questionnaires, two were used in all 
eight participating jurisdictions. In some places only 
two were used because the topics treated in the other 
questionnaires were not as important in the mathemat- 
ics curriculum at rfiat level; in others, it was decided 
that asking teachers to complete five extensive ques- 
tionnaires was not a good idea. In this paper, results 
from the twoquestionnairesusedinalleightsystemsare 
considered. An an9^y«is of the data from all five ques- 
tionni^ires will form part of the international report of 
the longitudinal component of the study, that report is 
expected to appear in the near future. 



spent on the topics covered in the Algebra question- 
naire. The median numberofhours internationally was 
23. Belgium not only reported the highest median 
number of hours spent on aigebra, 67 out of a total of 
140, but it also had the widest spread, indicating a 
considerable degree of variation within the country. 
All of the other countries have fairly narrow spreads. 

The graphs for all but two of the systerr include 
several oudiers, especially those for France and the 
United States. For the United States, the ouriiers 
represent Population A classes taking a full year of 
algebra, while most American students would not take 
such a course until the year following the Population A 
year. 

Topics Taught 

Eleven topics under the sub-headings Integers, 
Formulae, and Equations, were treated in the Algebra 
questionnaire. Taken together these eleven topics 
constiuite the definition of algebra at the Population A 
level for the classroom-process phase of the study. A 
list of the topics and the percent of teachers who either 
taught or reviewed them is shown in Table 2. 



The Teaching of Algebra 

Thebox-and-whisker (Tukey, 1977) plots in Fig- 
ure 1 summarize the distributions of amounts of time 
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Figure 1 . Time spent on algebra 
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Table 2 

Algebra Topics Taught or Reviewed 
(Percent) 



Topic 


BFL 


CBC 


CON 


FRA 


JPN 


NZE 


THA 


USA 


Integers 


















-concept of 


76 


100 


95 


73 


99 


98 


100 


99 


-addition 


87 


100 


95 


87 


99 


98 


100 


99 


-subtraction 


89 


100 


95 


88 


99 


98 


100 


98 


-multiplication 


85 


100 


92 


89 


99 


95 


98 


96 


-division 


86 


100 


91 


91 


99 


93 


99 


97 


-properties 


89 


85 


65 


90 


98 


75 


99 


82 


-order relations 


81 


92 


82 


93 


99 


96 


92 


93 


evaluate formulae 


75 


95 


95 


82 


99 


90 


96 


94 


derive formulae 


42 


64 


62 


60 


99 


31 


82 


61 


solve literal eqns.' 


40 


30 


42 


50 


68 


19 


79 


40 


solve linear eqns.^ 


95 


96 


92 


100 


99 


87 


97 


92 



* linear equations of the first degree, in one unknown, with literal coefficients 
^ linear equations of the first degree, in one unknown, with numerical coefficient^. 



Of the 11 topics, nine were either taught or re- 
viewed by virtually all teachers in every country. The 
exceptions were deriving formulae or equations and 
solving literal equations. These were taught by signifi- 
cantly fewer teachers than the other topics. 

The major differences among systerrtS regarding 
coverage of the 1 1 topics was whether the material was 
considered to be new for students at this level or was 
customarily taught earlier. In France and Belgium, 
almost all of the material dealing with integers is 



The set. of integers less than 5 Is 
recreser^.ted on one of the nur.cer 
iir.es shcvn beiov. Vnic.i one? 



-2 -I 0 



B 
C 
D 



-Z -I 0 



-I 0 



5 t 



5 6 



u 5 6 



u 5 6 



apparently taught before the Population A year and 
reviewed or ex tended during this year. InJapan,onthe 
other hand, almostal! teachers reported that the mate- 
rial dealing with integers was being presented to stu- 
dents as new material. There was considerably less una- 
nimity within most countries regarding the teaching of 
topics dealing with fo^'mulae and equations. Only the 
Japanese had high proportions of teachers indicating 
that these topics were t?ught to students for the first 
time in the Population A year. 

Integer Concepts 

The choice of which pedagogical approaches to 
use 'n the teaching of algebra seems to depend on the 
subject matter being presented and whether or not thr* 
topicisbeingintroducedforthefirsttime. Forexample, 
in introducing the concept of an integer, over 70 
percent of Population A teachers in countries other 
than Belgium and France emphasize the use of a num- 
ber line where the integers are seen as an extension of 
the natural numbers, or as coordinates of points on the 
number line. The number line may also be used to 
illustrate operations with integers, particularly addi- 
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Item 082 










Pretest (%) 




Posttcst (%) 




Change 




Correct 


Incorrect 


Omit 


Correct 


Incorrect 


Omit 




BFL 


48 


47 


5 


54 


43 


3 


6 


CBC 


.* 






66 


28 


6 


n/a 


CON 


37 


55 


8 


56 


42 


2 


20 


FRA 


46 


28 


26 


58 


34 


8 


12 


JPN 


40 


57 


2 


55 


43 


2 


15 


NZE 


41 


54 


4 


57 


42 


1 


16 


THA 


26 


69 


5 


39 


60 


0 


13 


USA 


39 


56 


4 


51 


49 


1 


11 



[C Item not included in the pretest. 
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tion, subtraction, and multiplication. 

Item 082 was the only test item that dealt directly 
with the use of the number line as a means of represent' 
ing integers. Growth scores exceeded 10 percentage 
points in six of the seven systems that made use of this 
item in the pretest, but pretest scores were quite low. 
Posttest scores were rather disappointing, with the 
highest being 66 percent conrect in Canada (B.C.). In 
every other place, the posttest score was less than 60 
percent. 

It is difficul t to explai n why students seemed to find 
this item so difficult. There was universal agreement 
that the item wasappropriatefor this level, and that tJie 
material had been taught. Inconrect answer choices 
were divided more or less equally among the four 
distractors, and rates of omission were not at all high. 

Item 014 also involved use of the number line, but 
in that instance students were asked to order three 
numbers including a negative rational: viz. - 1/2. Over- 
all, performances was no better on this item than on 
Item 082. 

Over 70 percent of teachers in every country 
except Belgium and France reported that they empha- 
sized the use of the number line in teaching integer 
concepts. In France and Belgium, where this material 
is usually taught initially before the Population- A year, 
the approach taken is much more abstract and related 
to axiomatic srructures. Thus, Belgian teachers were 
much more likely to refer to integers as vectors or 
directed segments than teachers elsewhere, while mor^ 
teachers in France than anywhere else emphasized a 
definition of an integer as an equivalence class of 
ordered pairs of whole numbers: 

-2 = {(0,2), (1,3), (2,4),...} 
or 

-2 = {(a,b)e WXW|b = a + 2} 

Another approach which is commonly used in the 
teach ing of integers, everywhere except in ITiailand, is 
the employment of examples of physical situations 
involving integers. Students discuss situations such as 
heights above or below sea level, temperatures above 
and below zero, and profit and loss in which integers are 
used as vector quantities to convey a sense of both 
quantity and direction. S;ich examples were reported 
as being emphasized particularly by teachers in Canada 
(Ontario) and Japan where integer concepts are intro- 
duced for the first time at this level and which had the 
youngest students participating. 



was 70 percent in Belgium (Flemish). Next was Japan 
at63 percenlandall the rest were less than 60 percent. 
The most popular distractor by far was 24 degrees, the 
algebraic sum of 31 and (- 7). Over 20 percent of 
students in each of the eight systems chose this re* 
sponse. Given such relatively poor posttest results, it is 
not at all surprising tofind that growth scores were very 
low: thehighestwas lOpercentage pointsineach of the 
two Canadian provi nces. Teachers every^vhere consid* 
ered the item to be an appropriate one, and indicated 
that students had been taught the concepts and tech* 
niques involved. In spite of this, posttest results were 
quite poor. 

Operations with Integers 

Whether or not operations with integers such as 
addition, subtraction, and multiplication are being 
taught for the first time, teachers in most countries say 
that they emphasize rules for performing those opera- 
tions rather than other approaches which attempt to 
build meaningful rationales for the algorithms em- 
ployed. Exceptions to this trend were reported primar- 
ily in Canada and New Zealand. 

Teachers in all countries are strongly of the opin- 
ion that students require a greatdeal ofpractice in order 
to become proficient in performing operations with 
integers. They also believe that students are not very 
interested in knowing why rules for performing opera- 
tions with integers work, and this opinion undoubtedly 
contributes to their emphasis on such rules. 

Performance on the three test items dealing with 
operations with integers (Items 012, 049 and 113) 
resulted m much greater growth scores overall, and 
higher posttest scores than was the case for items 
dealing with integer concepts. For example, on Item 
012which required students to find the product of (-2) 
and (-3), the performance of students in Thailand 
i ncreased by 53 percentage poi n ts, and i n On ta no by 47 
points between pre- and posttest. In Japan, on Item 
113, which required students to find the difference 

(-6)-(8) 

performance grew by 53 points, to 72 percent correct. 
However, only one national posttest score on any of 
these computational items exceeded 80 percent, qnd 
there is somerea son todoubtthatstudents had achieved 
mastery of these algorithms in spite of the opinions 
expressed by their teachers about the importance of 
practice. 
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Achievement rcsultson items related to real-world 
applications of integers were rather disappointing. For 
example, on Item 013 students were asked to te!l how 
much warmera temperature of 31 degrees was than one 
of -7 degrees. The highest posttest score on th'S item 
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Item 012 

(-2) X (-3) is equal to: 



A. -6 B.-5 C-l D.5 E 6 





Correct 


n ^ ^ /(V \ 

Pretest (%) 
Incorrerc 


C-nit 


Correct 


Fosttest (%; 
Incor.'ect 


Umit 


Unange 


BFL 


66 


- 3 


1 


78 


21 


2 


12 


CBC 


36 


59 


5 


72 


27 


2 


36 


CON 


14 


83 


3 


60 


39 


1 


47 


FRA 


72 


27 


2 


79 


20 


1 


7 


JPN 








85 


15 


0 


i/a 


NZE 


13 


86 


1 


47 


52 


1 


34 


THA 


9 


91 


1 


62 


38 


0 


53 


USA 


24 


7, 


2 


56 


44 


0 


32 



Item 049 

-5 (6-4) is equal to: 



A. 50 B.26 C.IO D.-IO E.-26 





Correct 


Pretest (%) 
Incorrect 


Omit 


Correct 


Posttest (%) 
Incorrect 


Omit 


Change 


BFL 


68 


24 


8 


75 


22 


2 


7 


CBC 








75 


18 


7 


n/a 


CON 


58 


31 


11 


65 


32 


3 


7 


FRA 


66 


24 


10 


75 


19 


5 


10 


JPN 








78 


21 


1 


n/a 


NZE 


53 


38 


9 


61 


36 


2 


8 


THA 


57 


39 


4 


59 


40 


1 


1 


USA 


59 


35 


6 


65 


34 


1 


5 



Item 113 



Pretest (%) Posttest (%) Change 

Correct Incorrect Omit Correct Incorrect Omit 



BFL 


46 


52 


2 


57 


42 


1 


11 


CBC 








49 


49 


2 


n/a 


CON 


16 


81 


3 


43 


55 


1 


27 


FRA 


70 


28 


1 


70 


29 


1 


0 


JPN 


19 


74 


7 


72 


27 


1 


53 


NZE 


19 


79 


2 


30 


69 


1 


11 


THA 


17 


82 


1 


32 


68 


0 


15 


USA 


24 


73 


3 


41 


58 


1 


17 



(-6)- (-8) is equal to: 

A. 14 B.2 C.-2 D.-IO E.-14 
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Solving Equarions 



Summary 



In teaching students how to solve equations of Uie 
first degree in one variable, e.g. 

7x + 5 = 40 

teachers . all countries reported emphasizing an alge- 
braic approach based either on properties of equality or 
on the properties of additive and multiplicative in- 
verses. Few emphasized other possible techniques such 
as trial and error. While this is perhaps not a surprising 
finding, it underscores an apparent tendency among 
teachers at this level to stress formal mathematical 
approaches to topics rather than more intuitive ones. It 
is particularly interesting to note that this is a wide- 
spread, if not a universal, tendency. 

Two test items dealt explicitly with the solution of 
equations. On Item 086, students were required to 
solve the equation li =o ; on Item 151, 

5x + 4 = 4x - 31. On neither item were there any 
posttest scores greater than 60 percent, and even in 
cases where growth was substantial the overall results 
were disappointing. For example, scores on Item 151 
increased by 21 and 24 percentage points in Belgium 
anuFrance, respectively. However, theirposttest scores 
were only 53 and 42 percent correct. This can hardly 
be interpreted as a positive result. 



Tne general impression that one obtains from 
studying performance on the algebra test items is that 
students found them difficult. Posttest scores were 
generally low, often surprisingly so. Teachers report 
having taught this material and they appear to empha- 
size rules and abstract justifications in their teaching. 
Tliese results pointout a needfor teachers, researchers, 
and curriculum developers to re-examine the teaching 
of introductory algebraic concepts and techniques to 
see whether this material can be taught more success- 
fully at this level, or perhaps to recommend that these 
topics be delayed until students are better prepared to 
assimilate them. 

The Teaching of Geometry 

The box-and-whisker plots shown in Figure 2 sum- 
marize the number of hours devoted to the study of 
geometry at the Population A level. Students in France 
spend r^vice, and in some case three times, as much ti me 
on geometry as students in most other countries. In 
Belgium the median number of class hours per year for 
geometry was slightly lower than in France: 37 out of a 
total of 140 hours of mathematics for the year. 
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In the other participating systems, less than 20 percent 
of class time during the year appears to be devoted to 
geometry; in several cases, considerably less. The 
specific results are as follows: 



Canada (B.C.) 15% 

Canada (Ontario) 10% 

Japan 17% 

New Zealand 127o 

Thailand 12% 

United States 8% 



In most cases, the amount of time devoted to 
geometry was significantly less than that devoted to 
algebra. In the United States and Canada (Ontario), 
teachers reported spending more time on fractions than 
on geometry, and such results should be a cause for 
concern to mathematics educators, curriculum devel- 
opers, and classroom teachers. 

Problems with the status of geometry in the mathe- 
natics curriculum have been ar.parent for years. Since 
at least the tir, :>f the Royaurront Conference in 1959 
(OEEC, 1961) and Dieudonn^^s ultimatum to the ef- 
fect that "Euclid must go!", the geometry curriculum \ 
has been in disarray. What these data may indicate is 
that teachers, faced with such disarray, have decided 
that their valuable and limited class time would be 
better spent working on area** of the mathematics 
cjrriculum other than geometry. Geometry, according 
CO thescdata, may well be on the endangered species list 
in mathematics education. 



Geometric Content In the Curriculum 

Sixteen topics, ranging from highly pecific ones 
such as the Pythagorean Theorem tofairlv oroad themes 
such as transformations, were listed ' n the geometry 
questionnaire. Teachers were asked to indicate whether 
or not each of these topics formed part of their geometry 
curriculum, and whether the topics they did teach w^re 
taughuas new or as review material. The sixteen topics 
were: 

angles (right, acute, supplementary, etc.) 
transformations (translations, reflections, 
rotations) 
vectors 

the Pydiagorean Theorem 

triangles and their properties (excluding 

congruence) 

polygons and their properties (excluding 
properties related to congment or simJar 
polvgons) 

circles and their properties 

congruence of geometric figures (including 

triangles) 

similarity of geometric figures (including 
triangles) 
O parallel lines 

ERIC 



spatial relations 

geoi' *,tric solids and their properties 
geomv^tric constructions with i..ierand 
compass 

proof (formal deductive demonstrations) 

tessellations 

coordinate geometry 

Treatment of these topics varied considerably in 
different systems, as is shown by the median polish 
results in Table3. Positive entries in die table indicate 
that a particular topic is given comparatively more 
importance in a given school system; negative entries 
indicate the opposite. Results greater than 15 in 
absolute value were considered significant for this 
analysis, and they are printed in bold type in the table. 

The large positive values in the rightmost 
column correspond to topics that are most likely to 
have been taught in these sys«'ems. The six topics so 
indicated are typical selections from plane Euclidean 
geometry: angles, triangles, polygons, circles, parallel 
lines, and ruler-and-compass constructions. The 
three topics with the most negative weightings — 
vectors, spatial relations, and proof — are the least 
likely to be taught among the sixteen listed. 

This set of topics did not fi t the curriculum particu- 
larly well in Belgium or France. The individual cell 
residuals for those countries show that they place much 
greater emphasis on transformations, vectors, and for- 
mal proof than do teachers elsewhere. These two 
countries also show significant negative residuals for 
many of the Euclidean topics, indicating that diese 
topics are not given much importance at the Popula- 
tion A level. In fact» except for the topic "angles", 
Belgian and French teachers reported that many of 
those topics did not form part of their geometry curricu- 
lum prior to the Population A level eid^cr. 

Teaching Practices Lmployed 

These cumcular disparities are reflected in the 
achievement results. Consider, for example. Item 122 
shown bcL w . The item deals with the sum of the angles 
in a triangle, and is a typical item of the kind included 
in an introductory treatment of Euclidean geometry at 
this level. 

Posttest performance on this item was very high in 
Japan at 89 percent, and fairly good in Canada (B.C. 
and Ontario), New Zealand, and Thailand, v^here 
almost all teachers reported teaching this topic. Sub- 
stantial growdi was also reported in Canad:, and New 
Zealand. Students in Belgium (Flemish^ and France 
didlesswell.Oj percent and 55 percent, respectively. In 
these two places, almost half the teachers indicated 
that this material had not been taught. 

In the United States, where fairly strict streaming 



ofstudents into different marhematics cources is widely 
practised at this le-el, postiest achievement on the 
item was low and only half the teachers reported 
teaching the material. In other words, although rhe 
United States results were very simi'ar to those from 
Belgium and France on this item, the factors underlying 
those performance levels were very different. 

Achievement levels on thefouritemsdealingwith 
transformational geometry in a fairly formal way were 
quite poor, even in Belgium and France, where a trans- 
formational approach is emphasized. For example on 
Item 173, shown below, the highest posttest score was 
only 20 percent correct. Students in France and Bel- 
gium seemed to find these items as difficult as students 
elscwheredid, in spiteof their reported emphasisin the 
curricula of those countries. 

When these data are combined with a descnption 
of the basic instructional approach to geometry taken 
by teachers, yet another indication of the disparity that 
exists a'Tiong countries with respect to the geometry 
curriculumbecomesclear.Teachersin Belgium, France, 
New Zealand and Thailand favor a transformational 
approach. In New Zealand, the approach is character- 
ized as an informal one, whereas it is much more formal 
in the otl^er three. North American teachers are much 
more likely to use an informal Euclidean or coordinate 
approach to geometry, and not to stress fomnal proof at 
all. In Japan, the approach is Euclidean, but there is 



some ambivalence about the degree of rigor used. 

There is also some apparent ambivalence in the 
opinions expressed by teachers in certain countries 
regarding the best way to teach geometry at this level. 
For example, in spile of the relatively formal nature of 
their instructional approach and curriculum, about 60 
percent of Belgian, French and Thai teachers agreed 
that, "An inuiidve approach to geometry is more 
nieaningful to students at this grade level than a formal 
approach." Moreover, although a majority of teachers 
in these three countries agreed that it was desirable tc 
ollow an axiomatic approach, there was not a strong 
consensus of opinion to tl at effect. 

The Role of Proof in the Geometry Curriculum 

A clear difference of opinion exists on the appro- 
priateness of proving theorems L ; students of this age. 
Teachers from Canada, New Zealand, and the United 
States are much more likely to agree that such pctivity 
should be postponed to a later grade when students are 
older, and presumably, more mature. Teachers from 
die other countries, and paiticularly those from France, 
am much less likely to agree with that opinion. Stu- 
dents' achievement levels on items involving proof in 
geometry were quite low in all countries, and it seems 
evident that students at this age level find such reason- 
ing difficult. 



Geometry 



BFL CBC CON 



Angles 


-16 


0 


0 


Transformf 


61 


-41 


-4 


Verfots 


112 


-30 


-8 


PyO m. 


0 


33 


5 


Triai ^ 


-30 


3 


-2 


Polygons 


'i5 


1 


-1 


Circles 


-28 


-4 


4 


Co.ignjence 


-14 


12 


13 


Similarity 


-4 


5 


13 


Parallel Lines 


40 


-7 


-4 


Spatial Rcl'ri. 


14 


-6 


2 


Solids 


0 


-4 


4 


Const. 


0 


0 


4 


Proof 


10? 


-26 


-5 


Coordinates 


0 


5 


-6 


Column 




8 


6 



Effects 
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Table 3 
Fopics Taught or Reviewed 
{Median Polish) 



FRA JPN NZE THA USA ROW 

EFFECTS 



-60 


24 


13 


• 12 


0 


31 


39 


10 


4] 


-21 


-36 


-1 


66 


-2 


J 


4 


-21 


-29 


-51 


-21 


-12 


33 


17 


-7 


0 


-17 


10 


0 


-4 


30 


8 


-10 


9 


-20 


0 


20 


11 


7 


-44 


-26 


2 


21 


-20 


-20 


-11 


28 


12 


7 


-51 


-26 


-10 


34 


13 


0 


7 


-6 


0 


-1- 


-4 


33 


-12 


72 


0 


8 


-7 


-19 


-14 


54 


-13 


24 


8 


-2 


17 


32 


-19 


-16 


-20 


23 


64 


11 


-10 


51 


-21 


-31 


15 


0 


26 


-13 


10 


-14 


0 


-25 


-4 


6 


0 


60 



70 



Item 122 





Correct 


Pretest (% ) 
Incorrect 


Omit 


Correct 


Posttcst (%) 
Incorrect 


Omit 


Chsngc 


BFL 


61 


31 


7 


63 


33 


5 


1 


CBC 


47 


38 


15 


73 


22 


6 


26 


CON 


53 


41 


5 


72 


26 


2 


19 


FRA 


49 


31 


20 


55 


30 


15 


6 


JPN 








89 


10 


1 


n/a 


NZE 


58 


41 


1 


75 


25 


0 


17 


THA 


65 


34 


1 


72 


28 


0 


7 


USA 


37 


58 


5 


56 


42 


1 


19 




is equal to 
A 75 
B 70 
C 65 
D 6C 
E 1*0 

Item 122 



u i.-.d -J ar 
u - y ? 



tvo vectors, 
belov represents 



Item 173 



ERLC 



Summary 

There is a remarkable degree of consistency among 
the teachers who participated in this study regarding 
the methods and materials to be used in the teaching of 
mathematics and in their opinions about issues in 
mathematics education. <iii v,xample of the latter, 
teachers repeatedly and universally disagreed with all of 
the statements on the questionnaires which suggested 
that calculators should be uscu extensively in mathe- 
matics classes at this level. 

There is also an apparent consc.nsus among teach- 
ers in all of thevsc countries, except France, that students 
need to be taught the same "^^iterial over and over 
again. No matter what topic, ^y few teachers said 
that they took it for granted that students had encoun- 
tered and masr^red this matenal in an earlier grade or 
grades. Only teachers in France reported doing so with 
any degree of frequency. 



The i.nplications of such a practice for the teach- 
ing of mathematics arc enormous. If teachers believe 
that they cannot assume that students have mastered 
and retained ma terial wh ich they have seen in previous 
grades, then a tremendous amount of reviewing must be 
done. Such a practice would seem to be wasteful m 
terms of the amount of time consumed, and stultifying 
for students who have to work through ^he same mate- 
rial over and over each year. 

Previous studies ofteaching practices conducted in 
North America have concluded that the teaching cf 
mathematics is largely a teacher-directed, "chalk-and- 
talk" affair (Romberg and Carpenter, 1986). The 
results of this study add further confirmation to this 
conclusion. There are many instances in the data 
where teachers indicated agreement with a statement, 
but reported doing exactly the opposite in practice. For 
example, they agree that having students measure and 
explore all importantactivitics in teaching of mc isure- 
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ment, but they also say that they do not put this opinion 
into practice. 

The reasons for this lack of congruence between 
opinion and practice are unclear. It may be that 
teachers find themselves so pressed for time to complete 
the prescribed curriculum that they cannot afford to 
devote any extra time to laboratory-like approaches. 
Or, it may be that they are unwilling to do so. 
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IDENTIFICATION AND DESCRIPTION OF OPPORTUNITY 
TO LEARN AND GROWTH IN ACHIEVEMENT 

Richard G. Wolfe 
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Mathematics is a topic that is mostly learned in 
school, so theconcext forassessing mathematicsachieve- 
ment needs to be the teaching and learning environ* 
ment of the mathematics classroom. The lEA Second 
International Mathematics Study (SIMS) looked at 
mathematics achievement and its environment from 
the perspectives of: 

1. the intended curriculum, 
defined by national and local 
syllabuses, guidelines, and regula- 
tions, by the contents of textbooks, 
and by school stnicturcs including 
tracking and retention; 

2. the implemented curriculum, 
denned by teachers* reports of their 
individual goals 3nd attitudes, of 
their use of instr.icdonal resources, 
of their teaching methods, and, es- 
pecially, of the actual time spent 
and specific mathematical material 
covered; and 

3. the attained curriculum, 
defined in terms of what mathc 
matics knowledge students acquire 
and also their atritudcs toward 
mathematics and mathemarical 
study. 

The SIlvlS -'urvey was carried out in the early 
I980's in some twv».nty countries. Two levels of school 
mathematics were Studied: Popula«:ion A, correspond- 
ing to the grade in which tlie modal student age was 13 
years, and Population B, corresponding to students spe- 
cializing in mathema rics in their final year of secondary 
school. The surveys included extensive background, 
attitude, and pcdaj;Ogirp! qiicstionnaires for teachers, 
school principals, and students in addirion to student 
achievement tcsring. The SIMS is partly a replicarion 
of an earlier international study, des*:ribcd by Husdn 
(1967), that was carried out in the early 1960*s in 
twelve countries. 

In most developed countnc:, Popularion A is the 
last level of schooling where education, and particu- 
larly mathematics education, is essentially universal:' 
most 13-year-old children are still in school and still 
taking mathematics. There are, however, important 
differences within and between countries in what 
mathematics is taught and how it is taught. In some 
contexts there is reperirion of earlier instnjction in 



arithmeric. In other contexts, there is introduction of 
new topics, especially algebra and geometry. There is 
variation in the extent of abstraction and symbolism 
used in presenring mathematical ideas. 

This paper focuses on Popularion A results ob- 
tained within SIMS for eight "countries" (Flemish Bel- 
gium, British Columbia, Ontario, France, Japan, New 
Zealand, Thailand, and the United States of America.) 
that used the full methodological design of the SIMS, 
including: 

1. longitudinal achievement test- 
ing: the students were pretested at 
beginning of school year and 
posttestedat the end of the school 
year, using a pool of 176 or 180 
mathematics items (through a test 
form rotation scheme, not all 
students had to answer all item'j); 

2. opportunity-tO'leam measure' 
ment: the teachers of the sampled 
classrooms ir^dicated for each test 
item in the pool whether their 
students had the opportunity/ to 
lea;Ti the mathemarics necessary to 
give a correct answer; and 

3. classroom process description: 
special qucstionna cs were filled 
out by the teachers during the 
school year to provide rich descrip- 
tion of classroom processes, 
concerning both methods for 
teaching specific mathemarics 
topics and general pedagogical 
styles. 

These are important methodological innovarions 
in large-scale, inten^arional educational surveys (or for 
that matter, for national or local studies) and allow 
detailed descriprion of what is taught and learned in 
one year, disentangling that from cumularive knowl- 
edge gained over a studenr*s school career. It is also 
possible tomakecorrcctly specified correlation ofwi thin- 
year learning wiih within-year classroom characteris- 
tics and processes. 

In Japan, nearly all the teachers (93 percent) con- 
sider the item to be old content, and while the students 
perform ratherwell on the item, there is no growth over 
the school year: in the pretest 63 percent get the item 
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Issues concerning the research design 

The array of data for the SIMS longitudinal, class- 
room process Population A study is shown in Figure 1 . 

In each country, a complex sample was drawn, 
starting with basic stra tifica tion of schools according to 
jurisdictional or geographical categories. The general 
pattern was then to sample schools witliin stratum with 
probabilities proportional to size or c^Jtimated size, to 
sample two classrooms at random from each sampled 
school , and then to regard as the final sampled units die 
teacher and all the students of the sclccied classrooms. 
The final sizes of the sample varied by country from 93 



to365 classrooms and 2567 to 8778 students. The basic 
survey statistics — viz., the percentage of correct item 
response — have standard errors of 1 or 2 percent, as 
estimated from the variability of classroom and school 
means. 

The research design is discussed fully in Burstein 
(1988). For the purposes of this paper, we need to 
consider the critical issue of the definition of the cog- 
nitive achievement measures. 

Inan international educational survey, the achieve- 
ment tests are inevitably compromises, because na- 
tional curricula var>* significanriy in content and em- 



Population A: students 
in grade with modal 
age of 13 years. Eight 
countries participating. 



Extensive classroom 
process questionnaires 



Fractions Y 


Geometry 


Ratio. ProportionTV 
Percent A 


Measurement 


Algebra Y 


General class- 
room practices 



School 

Organization 

Questionnaire 



r Teacher 
Background 
Atutudcs 
Teaching 
Practices 

Questionnaire 
^ . 

Each teacher ^ 
indicated OTL 
for each of the 
180 Items. 



Pretest ami 
posttcst... 



Core: 40 items) 



Rl: 35 items 



R2: 35 items 
R3: 35 Items 
R4: 35 Items 



Student 
Background 
Attitudes 
Questionnaire 



MiUh Tests 



^ Each student ^ 
answered liic 
core and 1 
rotiUcd fonn at 
the beginning 
of the school 



Each student 
answered llic 
core arid 1 
rouued form at 
llie erul of the 
school year. 



ERIC 



Figure 1. Data array for the longitudinal, classroom process component of the lEA Second International 
Mathematics Study. 
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phasis. In some countries, notably the United States Oi 
America, there issubstantialcurriculum variation within 
country. An initial focus of the SIMS project was to 
describe and report the curriculum variation, and a 
book of analysis has been prepared (Travers St West- 
bury, 1989). The mathematics item pools and classifi- 
cations were derived from the analysis, and the final 
selection of items for tlie international testing were de- 
termined by ensuring that for each country, :ts most 
important Population A contents were included, and 
that over all countries, more items were used foi con- 
tent areas that were important in a majority of coun- 
tries. This works reasonably well in that there are some 
contents that are taught most everywhere: basic arith- 
metic including fractions, the concepts of integers, 
methods for handling .^tio, proportion and percent, 
and some beginning algebra. On the othe' hand, there 
are topics riiat are not taught in some countries: for ex- 
ample, square root is not taught at this level in Japan. 
And some topics are taught with special content and 
emphasis: for example, in France and Belgium, geome- 
try is taught from a formal, transformational p'^rspec- 
tive. 



Such differences can make international compari- 
sons in achievement quite misleading, unless the com- 
parisons are made for specific content areas and are 
considered relative fo degrees of national emphasisand 
opportunity to learn. And the determination of spe- 
cific achievements means that the mathematics knowl- 
edge domain must be finely ardculated and that there 
need to be many mathematics test items employed. 



Cognitive Response and Opportunity to Learn 

The basic findings of the longi tudi nal SIMS survey 
are to be found in the item-by-item tabulations of cog- 
nitive response and opportunity to learn. An example 
for an item in the RatiO'Proportion-Percent topic is 
given in Figure 2. For japan, this item was part of the 
pretest given all students and was on a rotated form for 
the posttest, so it was given to 25 percent of the sample. 
In the other countries it was on the core test and so was 
taken by all students both at pretest and posttest. 



Belgium Flemish 
Teacher report of OTL 
30% Previous content 
30 New content 
40 Not taught 
Student Achievement 

Pretest Posttest 
Right 60% 61% 
Wrong 36 37 
Omit 4 2 



British Colmnbia 
Teacher report of OTL 
1% Previous content 

78 New content 

21 Not taught 
Student Achievement 

Pretest Posttest 
Right 44% 557o 
V/rong 47 39 
Omit 9 6 



Ontario 

Teacher report of OTL 
3% Previous content 
I 91 New content 
; 6 Not taught 
'.Student Achievement 
! Pretest Posttest 

i Right 40% 58% 
; Wrong 56 40 
I Omit 5 2 



France 

Teacher report of OTL 
367o Previous content 
48 New content 
18 Not taught 

Student Achievement 
Pretest Posttest 

Right 447o 567o 

Wrong 43 38 

Omit 13 



A painter is to mix 
green and yellow paint 
in the ratio of 4 to 7 to 
obtain the color I.': 
wants. Ifhehas28Lof 
green paint, how many 
hters of yellow pamr 
should be added? 



a. 
b 
c. 
d. 

e 



UL 
16 L 
2,SL 
49 L 
196 L 



Japan 

Teacher report of OTL 
93% Previous content 
5 New content 
2 Not taught 
Student Achievement 

Pretest Posttest 
Right 63% 62% 
Wrong 35 37 
Omit 2 2 



New Zealand 
Teacher report of OTL 
5% Previous content 

36 New content 

59 Not taught 
Student Achievement 

Pretest Posttest 
Right 37% 45% 
Wrong 62 5^4 
Oinit _ 1 1 



Thailand 

Teacher report of OTL 
2% Previous content 
93 New content 

5 Not taught 
Student Achievement 

Pretest Posttest 
Right 51% 64% 
Wrong 49 36 
Omit 1 0 



U.S.A. 

Toacher report of OTL 
6% Previous content 
83 New content 
11 Nottiiught 
Student Achievement 

Pretest Posttest 
Right 33% 43% 
Wrong 63 55 
Omit 4 



O „ Figure 2» Opportunity to learn and pretest and posttest achicvcrr^cnt across countries on one ratio question. 
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In Thailand and Ontario, we see the opposite cir- 
cumstance. In Thailand, nearly all the teachers (93 
percent) consider the item to represent new content 
that was taught during the year, and there is student 
cognitive growth from 51 percent correct in the pretest 
to 64 percent conrect in the posttest. Similarly in 
Ontario, 91 percent of the teachers ^aught the mathe- 
matics for this item as new material, and the students 
showed growth from 40 percent correct in the pretest to 
58 percent correct in the posttest. 

The United Statesof America., British Columbia, 
and New Zealand show few teachers who regard this 
item to represent old content (6 percent, 1 percent, and 
5 percent) and progressively decreasing percents of op- 
portunity to learn this item as new material (83 percent 
78 percent, and 36 percent). The student achieve- 
ments and levels of cognitive growth are correspond- 
ingly low: 33 percent, 44 percent, and 37 percent at the 
pretest going to 43 percent, 55 percent, and 45 percent 
at the posttest. 

The results are more confusing for Belgium Flem- 
ish and France, because some teachers regard the con- 
tent to be old and otheis regard the content to be new. 
The students in Flemish Belgium perform well on the 
item but show no growth (60 percent correct on the 
pretest, 61 percent on the posttest); the teachers seem 
to be split evenly between regarding the item as old 
content, as new content ta<jght, or as content not 
taught. In France, nearly half (48 percent) of the 
teachers report teaching the mathematics for the item, 
but another 36 percent regard the item'scontent as old, 
and the student show some growth, from 45 percent 
correct the pretest to 56 percent correct at the posttest. 

Informal Trans formarions in Geometry 

All of the mathematics testing in SIMS was done 
withinafive-altemative,multiple-choiceformat. While 
the validity of the interpretation of the item response 
and its correlates depends primarily on the logical and 
empirical connections made between the mathematics 
test item and the mathematics curriculum— intended 
and implemented — the interprctarion also hinges on 
an understanding of the students' response processes, 
which are as much psychological as mathematical. The 
parameters of the processes may be affected by and 
change during the year of instaiction. The multiple- 
choice response mode imposes inherent limitations on 
how much one can tell about how a student responds. 

In particular, in making inicmational compari- 
sons, one must consider how the item response patterns 
vary between countries. A major point of difference is 
the tendency for students in some countries to omit re- 
sponding when they arc evidently unsure of their 
knowledge contrasted to the behavior of students in 
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other countries to try to answer each question — per- 
haps by *'guessing". The international instaictions did 
not advise students to guess nor threaten any "correc- 
rion" in the scoring, but simply stated that these were 
international tests and that some items would be unfa- 
miliar to them. 

The students in France were inclined to omit re- 
sponses, with the omission rate approaching 50 percent 
forsome items. According to the French study director, 
students in France arc expected to be able to defend 
their answers: guessing would not be considered appro-, 
priate. The omitring rate in Thailand is, on the other 
hand, less that 1 percent for most items. A more 
detailed analysis of the Thai data has shown little 
correlarion between wrong responses at the beginning 
of the year and wrong responses at the end of the year: 
that is, students must leel obliged to answer each 
question and are guessing when they do not know the 
answer. For the countries with omission rates between 
these extremes, there is some evidence for systematic 
misinformarion (viz., same wrong response at the be- 
ginning and end of the year) and some evidence for 
seemingly random responses. But there is no jusrifica- 
tion for a general "correction" for "guessing" adjust- 
ment to the response data. 

One way to handle the omitting-guessing ambigu- 
ity is to preserve, throughout the interpretarion, a 
three-way tabularion of item responses, considering 
rights, wrongs, and omits at pretest and posttest. This 
will be Mlustrated in considering the four items in the 
SIMS pool that concerned informarion transforma- 
tions in geometry. The items are presented in Figure 3. 
The mathematics necessary to ^xt the correct answers 
involves some terminology ("n -ige", "reflection", 
"translation") and notation (e.g., tl\c use of vertex 
letter) as well as spatial ability. 
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In which diagram below is the second 
figure the image of the first figure 
under a reflection (flip) in a line? 



a. 



d. 



FL 
FT 



y 
k 



p' 



Q 



R 



(5- 



PQRS is a rectangle. Its image after a 
transformation is the rectangle 
P^Q'R'S\ as shown above. The 
transformation used could have been: 

a. a rotation about the origin 

6. a reflection in the 3^-axis 

c. a translation parallel to the jc-axis 

d. a reflection in the jc-axis 

e. a translation parallel to the^'-axis 





Q 



A ABC and A A'B'C* are congruent and 
their corresponding sides are parallel. 
A ABC maps onto A A'B'C* 
by a 



a. 
b. 
c. 
d. 
e. 



reflection 

glide reflection (slide flip) 
rotation (turn) 
enlargement 
translation (slide) 




R 



A PQT can be rotated (turned) or to A 
SQR. The center of rotation is 



a. 
6. 
c. 
d. 



point P 
point Q 
point R 
point S 
point T 



Q Figure 3» Four Items concerning informal transformations in geometry. 
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The percents of right, wrong, and omi t sum to 1 00 
percent, and so the response distribution for a given 
population at a given time can be plotted as a pomt in 
the equilateral triangle of a barycentric coordinate sys- 
tem. The comers of the triangle represent 100 percent 
omit, 100 percent wrong, and 100 percent correct. 
Each item is represented as a pair of points, correspond- 



ing to the response distribution at the beginning and at 
the end of the school year. The barycentric graphs for 
all eight countries appear in Figure 4. (Note diat in 
Japan and British Columbia, beginning v year data 
were not collected for these items.) The corresponding 
figures, including opportunity to learn, are given in 
Table 1. 





Figure 4» Beginning of the year to end of the year change in right-wrong-omit proportions for informal 
transformations in geometry items by country* 

Note; Barycentric coordinates arc u$cdj the left comer is 100% wrong, the righ* comer is 100% right, and the (cut-off) top is 100% 
omit. In CBC and JPN, "x" shows end of year results. Otherwise, the lines show shift from bcfinning to end of year. 
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Table 1 

Student Achievement and Opportunity to Learn: 
Informal Transformations in Geometry 



Item and Student Achievement Resultc Teacher 

Country PretestPosttest Reports of OTL 

Right Wrong Omit Right Wrong Omit Previous New Not 

Content Content Taught 
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34 
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Much of the story can be told by considering op- 
portunity to learn, although the relatively high rate of 
correct lesponsc to some items (e.g., item A in Belgium 
Flemish) without apparent benefit of instructional 
opportunity suggests that common-sense answers were 
successful. In New Zealand more than odier countries, 
the rotation and reflection items — A, B, and D — are 
taught and substantial growth takes place, while the 
translation item — C — islessoftentaughtandlessoften 
learned. In the U.S.A., Ontario, and Thailand, there 
is less opportunity and less achievement. It is impos- 
sible to say whether the Japanese students acquired 
their high achievement or the British Columbia stu- 
dents acquired their mixed achievement during the 
year or prior to the year, since there are no pretest data. 

The French and the Belgium Flemish responses aie 
interesting because these countries have distinctive 
geometry curricula, involving not just informal trans- 
forma tionsof this sort, but also formal transformational 
geometry. Items B and C contain the most formal 
terminology, and the French students show great shift 
in their response: they get the items wrong radier dian 
omitting them! Items A and D involve only a little 
terminology, and the students show better achieve- 
ment. Item A is reported by the teacher to be taught, 
and there is i lot of growdi. In the case of Belgium 
Flemish, where students study vectors, only item C 
shows substantial growth, and that might well be ex- 
plained through transfer of knowledge. 

Thisstudentachievement data in geometric trans- 
formations can be compared widi die teachers* opin- 
ions expressed in reaction to the proposition: "Geome- 
try should be taught mainly through transformations 
(flips, turns, stretches).'* The proportion of teachers 
agreeing or strongly agreeing was as follows: 



Belgium Flemish 


7% 


British Columbia 


3% 


Ontario 


8% 


France 


21% 


Japan 


26% 


New Zealand 


54% 


Thailand 


46% 


U.S.A. 


3% 



The opinions of the New Zealand teachers espe- 
cially seem to be put into practice and affect student 
learning, v/lMle the opinions of the Thai teachers are 
not in accord with the student data. 

Growth in Mathematics Achievement 

From thegeomerryanalysis,wecanscethatachieve- 
ment in mathematics and growth in achievement can 
be very specific: in the particular educational environ- 
ment of a country, some itemsfrom a small, presumably 
homogeneous set are learned and others are not, and 
when we shift our attention to the educational envi- 



ronment of another country, there is a reordering of 
what is learned. These specificities of learning evi- 
dendy depend on die ^cificities of opportunities to 
learn and on the emphasis given to different mathe- 
matical contents and per^ctives. Furthermore, the 
psychology of die item, response, or non-response, be- 
tween countries and from the beginning to the end of 
the school year makes comparisons difficult. And this 
all makes us despair of our ability to aggregate die 
achievement results over items to form meaningful 
subtest scores for international comparison. Certainly 
a "total'* score would be nonsensical. 

One solution is to keep the analysis at the item 
level and to lode over mathematical topics — and even- 
tually over countries — for instances of high achieve- 
ment and growth. 

The tracking of growth will be illustrated with the 
results from the "core" mathematics test in the United 
Statesof America. Thistescconsistedof40itemsstrati- 
fied 

in to 8 items from each of 5 con tent areas: fractions, 
ratio-proportion-percent, algebra, geometry, and meas- 
urement. All students were expected to take the core 
test at die beginning and die end of die school year. In 
fact, die sample size of diose who did was 4399. 

Because the came items were an.Avered at each 
time point, the cross-tabulation can be made of right 
and wrong by beginning and end of year. This leads to 
four proportions that characterize an item*s initial 
difficulty and its grovvth: the proportion of students 
who got the item wrong both times; the proportion of 
students who got the item right at first time but wrong 
the second time; the proportion of students who got the 
item wrong die first time and right thesecond time:and 
the proportion of students who got the item right both 
times. 



These proportions sum to 1 and therefore the items 
can be represented in barycentric coordinates as points 
in a regular tetrahedron, th x comers of which corre- 
spond to the hypothetical cases where 100% of the 
students get an item wrong at bodi times, 100 percent 
get an item right the first time but wrong the second 
time, etc. In orde* .o view the configuration that the 
points form in d^c tetrahedron, die Macspin program 
(Donoho, Donoho, &. Gasko, 1986) was used. This 
program runs on the Apple Macintosh and allows a 
three-dimensional configuration to be viewed as it 
rotates around any axis. As soon as motion begins, the 
eye forms a good picture of the configuration. The 
static, two-dimensional projections given in Figures 5a 
and 5b are snapshots taken from several views. Al- 
though diere arc diree degrees of freedom in die item 
stanstics, the points closely follow a two-degree-of- 
freedom surface. The program was used to focus on that 
surface, in Figures 5c and 5d, and dien to label the 
points, in Figure 6. 



One major finding is that growth is small. The 
reason is certainly no*- that there is no room for growth, 
since most of the beginning'Of'year results are in the 
lower o r moderate cat( 'gory . Tlierc a re a few i te ms w i di 
spectacular gains, but this provides little comfort when 
we look at the content of these items. The item 
providingthe largest gain has the following stem: "(-2) 
X ('3) is equal to...." That is, students do not know the 
multiplication rule for negative numi>ers at the begin* 
ningof the year, but they do seem successfully to learn 
it. The secondhighest gainer is: ''if X =-3, the value of 
-3X is...." which is the same rule, with a little notation. 

When the points are tagged witli content categO' 
rics, we see again the fact that there is great inho' 
mogeneity of : "hievement within what was considered 
to be homogeneous content units. 

When the poi nts a re divided arxording to high and 
low opportunity to learn, the effect of instruction is 
evident. 
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Learner * 



Forfcttcr ^, . . 

KnownocKing ^* 

** *Evert>f{t^t 



c 




Conclusions 

The major finding of the SIMS analysis and survey 
of mathematics achievement and growth at critical 
juncture between elementary and secondary education 
is that not very much mathematical achievement is 
takingplace. Wedo see some ratherdirect connections 
between curriculum and learning, and so perhaps the 
conclusion should be that the objectives of the mathe- 
matics curriculum are too limited: if more content were 
introduced, it seems likely that more mathematics will 
be learned. Furthermore, from analyses in Burstein 
(1989), we know that the attitudes of the students — 
shared to a great extent by their teachers, and not 
undergoing very much shift during this year of school — 
are that mathematical formulasandrulesand thecalcu- 
lation of answers are what is important. Perhaps if 
mathematics were cast in a more creative and interest' 
ing light, students would like it better and would be 
motivated to learn more. 
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Figure 5. Not knowing, knowing, learning, and forgetting mathematics items. 
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♦Unknown 



Forgotten 



♦ ♦ 



♦Learned 



Known ♦ * 



Figure 6. Item not knowing, knowing, Icami-.^, and forgetting by content and opportunity to learn. 

Noce: 

a. "Unlmown"^ an irem rarely learned. "Known- is an icem usuallly loiown at the beginning of the year. "Learned" ,s an item often 
learned during the year. Rorgoccen" is an item often forgotten during the year.. 

b. By concent of item: □ is measurement; A is geometry; x is ratio-proportion' percent; ♦ is fractions. 

c. Less than 75% opportunity to learn this year. 

d. More than 75% opportunity to learn this year.. 
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THE CURRICULUM OF THE SCUOLA MEDIA SINCE 1979 



Raimondo Bolletta 



In this paper I will \y presenting the results of a 
research project called V \MIO {Verifica Abilitd 
Matematidie htruxme deWObbligo - Verification of 
Maths^inatical Ability in Compulsory Schooling) which 
was developed through a course of study fcr? doctorate 
in experimental educational research. The project was 
conductec' A^ith funds from the European Center of 
Education in Fiascati. 

With the third and final year of the Scuola }Aedia 
(lower secondary school), 13-14 year olJs finish tl>e 8- 
year program of compulsory schooling in Italy. ll>e 
school system in Italy is cf ntralized and there is a 
national syllabus for each age group up to 14 years of 
age. There is also a national syllabus fur each kind of 
upper secondary school, but our asse&>ment system is 
rather informal and not centr " controlled. W? do 
not have any kind of examination board and all the ex- 
aminations and all types of evaluation throughout 
sctioolingare directly administered by classroom teach- 
ers. The only form of external evaluacicn occurs at the 
end of the upper secondary sch^'^- ; for the fi nal Diploma 
(Matjritd). A commission of external teachers ap- 
pointed by the Ministry of Education assesses tF^e stu- 
dent's achievement on the basis of two written essays 
^nd on the outcome of an interview covering 4 discij. !i- 
nary subjects: two chosen by the commission, and two 
by the individual student. 

There are two main consequences ofth is situation. 
The first one is positive, in that there is a lot of freedom, 
and it is possible to introduce any kind of innovative 
teaching experiment that we want (more or less). On 
the other hand, it is very difficrlt for the school system 
tc- have documented knowledge of what is really hap- 
penii 2 in different parts of the country, and to use the 
same k i nd of measu re a nd the same sta ndards na tional ly 
for the outcomes of the school systc.n. 

In 1979 the Ministry of Educatioi. carried out a 
reform of the Zcuola McJki which c} ^nged the sylla- 
buses i n all subjects. The new progranr s in mathematics 
were particularly innovative. Thes'i, programs were 
prepares by a large commission in which many promot- 
ers of innovation, both secondary schoc^ teachers and 
university professors, were represented. 

It is difficult to summarize in a tew words all the 
rich and interesting aspects of these programs. I shall 
mention only a few. Mathematics and experimenta 
science are taught by the same teacher Programs are 
not prescriptive but they suggest some general •'hemes 
and subthcmes. The classroom teacher is given direct 




>"pc.isibility for the choice of specific topics in each 



area and for the organization and scheduling of class- 
room activity. Topics such as probability and statistics, 
logic and introduction of geometry by isometric and 
non-isometric transformations are the main innova- 
tioa' (firCiH the point of view of contents), whereas 
from a methodological point of view, particular atten- 
tion ispaidtointerdisciplinarit>*,totheapplif itionsof 
mauiematics to reality, and to some simplificucions of 
algebraic ailes of calculation. Set theory is recom- 
mended only as a language among others. 

Three years later, it was deemed necessary to 
change the final examinations of the ScuolaMedia in 
order to make them correspond to the new contents 
and methodology. This gave rise to a large debate on 
the best ways to assess c'^anges in students* perform- 
ance and, more generally, on tho problem of the effec- 
tiveness of the w programs. 

Wide-spread discontent was peiceptible: from 
teachers of the Scuola Media because the syllabus was 
too ambitious and too vast, and from upper secondary 
school teachers because levels of achievement of stu- 
dents were decreasing. There was a general agreement 
on the fact that it was very difficult to implement 
revised programs in the Scuola Media if both elemen- 
tary school (last reform in 1952) and upper secondary 
school programs remained antiquated. 

The idea of the VAMIO survey sprang-up in this 
context, and considered these kinds of problems. The 
principal aim was to produce a standardized test to help 
evaluate levels of achievement of single classes or 
individual sludents at the end of the Scuo^xiMedia and 
to diagnose the real prepararion of students and the 
eventual need of remedi<:l work at the beginning of 
upper secondary school. In order to reach thio aim, it 
was necessary to invesrigate the effecdvener^ of the 
prograr»s more deeply, and to know something more 
qhout the actual implementation of the mathematics 
curriculum. 

The problem was to find a siraple way to collect 
data on the actual interpretation and implementation 
of the official programs. For this purpose, we based 
ourselves upon the methodology of the lEA surveys, in 
particular, upon the preliminary studies of intended 
and implemented curricuia. 

Is it possible to have reliable indications about 
actual activities inside classes direcdy from teachers? 
Are they good judges and impartial observers of the 
class situation? How should one define, el aboraie and 
use the variable ^^Opportunity to Learn"? Ho-jv should 
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one describe in a clear and understandable way the 
content of programs? Is it possible to have instruments 
for measuring the amount of innovation promoted and 
implemented by the program? 

Sample 

We interviewed a national sample of 1300 teach- 
ers by a questionnaire on the "actual'* program imple- 
mented in the classroom and, one year later, studied the 
achievement of an inc'ependent sample of 2800 stu- 
dents by means of a mulriple-choice test. 

It seemed that the crucial variable was the teacher. 
We interviewed a representative sample of mathemat- 
ics teachers of the Scuola Media by questionnaire. Each 
teacher was asked about three clusters of variables: the 
first one refers to tCdcher characteristics (sex, age, type 
of degree, place of residence, textbook sed, general 
t(. aching attitudes); the second one concerns the pro- 
gramactually developed in the classroom; and the third 
one is the "opnortunity to learn'* which refenred to a set 
of items. 

The scholastic program was described through a 
list of contents, about 130 topics, and for each one of 
them t:ie teacher had to teli its relevance in terms of 
rime spent in class to develop it, in which grade it was 
normally developed, aad the level of difficulty for 
students to learn it. The relevance variable was ex- 
pressed by a six- value scale (0 = the content was not 
taught; 1 = brief comments in one or two lessons; 2 = 
general but synthetic treatment; 3 = thorough treat- 
ment in 5-8 lessons, even in different years; 4 = system- 



atic and repeated treatment; 5 = the content was 
developed with particular care in 20-30 lessons d-.ring 
the three-yearcourseof the ScuolaMedia). Asacontrol 
of these indications, for a set of about 140 items, we 
have the values of the "opportunity to learn** variable 
which is defined as the predicted percentage of students 
able to answer the items correctly. 

Teachers responded to this survey positively. 89 
percent of the schools invited to participate accepted 
and 91 percent of the teachers inside the accepring 
schoolsanswered the questionnaire. Controls of coher- 
ence among different variables show that we had a good 
quality of answers. In parricular, it does not seem that 
teachers gave a biased or oprimistic image of the real 
activity in thie classroom. 

Keeping in mind only distributions and modal 
values of the relevance variable, it is possible to have an 
interesting map of the syllabus which split the contents 
into three clusters: the first containing topics whose 
malal value of importance is 4, &iq second one contain- 
ing topics whose modal value is 0 or 1, and the third 
containing the '"emainder of the list. Looking at these 
three parts of the list, it seems that actual syllabuses are 
considered too vast and each teacher decides what part 
of the program he or she should develop. Although a 
large majority of ce&chers are in agreement with a core 
program which contains the most traditional topics, 
they eliminate n.any topics that are too innovarive or 
tocdifficultformoststudents(excludedprogram). The 
remaining topics are developed qjrionally^ only by 
small numbers of teachers. 



Table 1 

Topics of Syllabus Most Often Covered (Core Syllabus) 



Topic 



Frequencies 
1 2 3 4 5 



GEOMETRY- QHE ^IRST REPRESENTATION OF THE PHYSICAL WORLD) 



[A] 
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01 Study of plane figures arising from models of: nature 


2 


19 


18 


19 


33 


10 


03 Drawing piane figures 


1 


14 


19 


17 


41 


7 


04 Nomenclature relative to polygons 


1 


25 


25 


12 


33 


4 


06 Calculation of perimeters and areas of quadrilaterals. 


2 


1 


3 


17 


57 


20 


1 0 Figures with equal areas 


1 


4 


14 


22 


47 


13 


1 2 Study of regular polygons 


1 


7 


24 


28 


32 


S 


1 3 Theorem of f ythagoras 


0 


1 


6 


21 


50 


22 


14 Application of Pythagoras' Theorem in the 














solution of geometric problems 


0 


0 


1 


10 


54 


36 


15 Use of straight ecge, square and compass in 














geometric constructions 


12 


27 


17 


10 


29 


6 


24 Study of solid figures arising from models of nature 


1 


17 


21 


16 


39 


r 


25 Regular polyhedra 


5 


21 


23 


15 


33 


4 


28 Cube 


1 


6 


26 


23 


4C 


3 


30 Parallelepiped 


0 


5 


25 


23 


44 


4 


31 Prism 


1 


5 


24 


24 


44 


3 


32 Pyramid 


1 


5 


21 


26 


45 


4 


33 Cylinder 


1 


4 


23 


24 


44 


4 


34 Cone 


2 


4 


23 


24 


44 


4 


37 Composite solids ^ _ 

OO 


7 


11 


20 


19 


38 


5 



Topic 



Frequencies 
0 1 2 3 4 5 



NUMERICAL SETS [B] 



01 Set of natjral numbers 


1 


14 


22 


19 


35 


10 


03 Decimal metric system 


1 


13 


26 


23 


33 


5 


05 Operations with signed numbers 


1 


1 


7 


24 


56 


11 


06 Comparison of signed numbers 


1 


15 


24 


20 


34 


6 


X ""^raphical representation of signed numbers 


0 


16 


25 


18 


35 


5 


Od Fraction as an operator 


2 


4 


10 


25 


45 


14 


09 Equivalent fractions 


1 


8 


20 


26 


39 


7 


10 Concept of ratio 




5 


17 


24 


44 


10 


12 Expressions with rational numbers 


1 


3 


15 


23 


46 


11 


1 3 Proportions 


0 


1 


7 


29 


53 


10 


14 Solving for an unknown in a proportion 


0 


5 


17 


27 


45 


6 


15 Application of proportions in the solution 














of problems 


1 


2 


10 


23 


54 


10 


22 Direct and inverse numerical operations 


1 


7 


17 


21 


43 


11 


23 Properties of numerical operations 


1 


8 


22 


27 


35 


7 


24 Raising to a power 


0 


4 


17 


34 


39 


6 


27 Common multiples and common divi' ors 














of several numbers 


1 


5 


23 


33 


34 


5 


28 Prime factorization. 


0 


4 


21 


36 


35 


4 


29 Rules for the calculation of the CCD and LCM 


0 


4 


20 


36 


35 


4 


30 Exercises in exact and approximate calculation 


8 


18 


24 


17 


28 


5 


32 Effective use of numerical tibles 


2 


16 


26 


20 


33 


4 



MATHEMATICS OF CERTAINTV AHD MATHEMATICS OF THE PROBABLE [C] 



PROBLEMS AND EQUATIONS 








01 Recognition of signified t information and 










variables in a word problem 


3 5 10 


14 


44 


24 


03 Setting-up oi arithmetic expressions for the 










solution of a word problem 


5 15 26 


21 


27 


7 


05 Reading, writing, use and manipulation 










of simple formulas 


3 4 11 


16 


49 


18 


06 First-degree equations 


1 1 8 


29 


49 


11 


COORDINATE GEOMETRY 








01 Coordinate geometry in concrete situations 


7 19 22 


19 


27 


7 


04 Coordinates of a po:nt in the plane 


2 10 25 


27 


30 


5 


06 Cartesian plane representation of mathematical 










laws describing real phenoniena 


5 b 23 


26 


35 


5 


08 Cartesian representation of direct 










proportionality 


3 4 20 


29 


39 


6 


09 Cartesian representation oF inverse 










proportionality 


3 4 20 


29 


39 


5 


GEOMETRIC TRANSFORMATIONS 









IE] 



CORRESPONDENCES AND STRUCTURAL ANALOGIES [G] 
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03 Concept of function 



7 9 25 25 29 6 



TABLE 2 

Topics of the Syllabus Rarely Covered (Excluded Syllabus) 



Topic Frequencies 

0 1 2 3 4 5 



GEOMETRY THE FIRST REPRESENTATION OF THE PHYSICAL WORLD [A] 



02 Gonstruction of triangles with material 


10 


43 


30 


12 


5 


1 


uj u'>mg Venn uiagrams wiin sets or polygons 


21 


37 


22 


8 


10 




07 Symmetry of the square 


17 


39 


27 


9 


8 


] 


08 Symmetry of quadnlaterao 


21 


34 


28 


9 


7 




09 Axes of symmetry of triangles 


18 


35 


29 


9 


8 


] 


1 1 Convex and concave polygons 


11 


57 


20 


5 


6 




1 6 The problem of calculatir/; pi 


10 


41 


26 


9 


12 




1 7 The problem of squaring the circle 


50 


28 


16 


4 


3 




21 Relative positions of two lines in space 


5 


42 


34 


11 


7 




22 Dihedral angles 


7 


46 


31 


8 


7 




23 Angles in a solid 


41 


31 


18 


5 


5 




26 Axes of symmetry of regular polyhedra 


44 


27 


15 


7 


7 




27 EuleKs formula for a polyhedron 


59 


23 


10 


3 


4 




29 Plane sections of the cube 


33 


24 


18 


10 


14 




35 Spheres 


26 


16 


20 


15 


21 


2 


36 Plane sections of the cone and the cylinder 


34 


23 


18 


10 


14 


1 


NUMERICAL SETS 














02 Ancient number systems 


8 


60 


24 


4 


4 


0 


04 Arithmetic of odd and even numbers 


24 


35 


25 


7 


8 


2 


1 9 Base 2 


21 


24 


34 


15 


6 


0 


20 Bases other than 1 0 


27 


27 


30 


11 


5 


0 


21 Ordpf-cf magnitude 


9 


34 


30 


12 


15 


1 


31 Successive approximations as an approach 














to real numbers 


32 


26 


23 


10 


8 


2 


33 Use of calculators 


39 


27 


16 


7 


9 


2 



MmTHEMATICS of certainty and MATHEMATICS OF THE PROBABLE [C] 



02 Logical connectives 


43 


18 


21 


10 


7 


1 


03 Circuits and switches 


49 


16 


20 


11 


4 


1 


04 Logical operations and set operations 


36 


14 


23 


15 


10 


1 


09 Density maps 


29 


21 


24 


14 


11 


2 


1 0 Absolute frequency 


30 


27 


26 


12 


6 


1 


11 Relative frequency 


30 


25 


27 


12 


6 


1 


14 Surveys 


29 


18 


25 


15 


11 


2 


1 5 Phases in a statistical study 


35 


21 


24 


12 


8 


1 


1 6 Discrete and continuous variables 


70 


13 


11 


4 


3 


0 


1 7 Time series 


82 


7 


6 


2 


3 


0 


1 8 Various types of tables 


41 


18 


19 


9 


12 


2 


1 9 Mode 


45 


2 


19 


7 


5 


1 


20 Median 


44 


23 


20 


8 


5 


1 


21 Weighted arithmetic mean 


58 


17 


15 


6 


4 


0 


22 Experimental laws and interpolation 


70 


10 


11 


5 


4 


0 


23 Sampling 


58 


20 


13 


4 


4 


1 


24 Statistics and probability 


29 


17 


27 


15 


10 


2 


25 Simple geometric mean 


64 


17 


12 


4 


3 


0 
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Topic 



Frequencies 
0 1 2 3 4 5 



26 Properties of the arithmetic mean 


6S 


14 


11 


4 


2 


0 


27 Dispersion 


83 


9 


4 


2 


1 


0 


28 Index numbers 


84 


8 


5 


2 


1 


0 


29 Gauss' curve 


57 


24 


1 3 


3 


•J 
Z 


0 


30 Extrapolation 


81 


11 


5 


2 


1 


0 


31 Graphic representation of polar coordinates 


S3 


7 


5 


3 


2 


0 


32 Properties of the median 


80 


11 


5 


2 


2 


0 


33 Correlation 


86 


7 


5 


1 


1 


0 


34 Tables of random numoc:i 


90 


5 




1 
1 


1 
1 


n 


35 Structure of populations by age grouping 


78 


12 


O 


9 
Z 


9 
z 


n 


36 Rate of growth of population 


72 


16 


7 
t 


9 


9 
z 


n 
u 


37 Characteristics of census taking 


70 


19 


/ 


1 
Z 


z 


r\ 
U 


38 Frequency 


53 


22 


1 A 


0 


4 


U 


PROBLEMS AND EQUATIONS 










02 Flowcharts 


26 


19 


21 


13 


17 


4 


07 First degree inequalities 


48 


14 


17 


11 


9 


1 


COORDINATE GEOMETRY 










02 Reading topographical and geographical maps 


22 


30 


ZZ 


1 "X 
1 0 


1 1 
1 1 


z 


05 Representation of polygons on graph paper 


14 


11 


97 
Z/ 


99 

zi 


zl 


4 


07 Graphical representation of exponential growth 


45 


17 


1 0 


1 1 


Q 
0 


1 
1 


10 Cartesian graph of y = x^ 


23 


11 


Zo 


1 0 


i. \ 




14 Condition of perpendicularity of two lines 


36 


13 


91 
Z3 


1 9 


1 i 


Z 


15 Graph of an inequality 


79 


6 


Q 
0 


A 


3 


1 
1 


16 Applications to problems of linear programming 


82 


6 


u 


9 


9 


1 
1 


1 7 Analytical study of conic sections 


84 


6 


c 


•a 


9 


n 
u 


GEOMETRIC TRANSFORMATIONS 










02 Use of protractor 


3 


35 


29 


15 


16 


2 


03 Construction of an angle bisector 


11 


41 


29 


9 


9 


1 


09 Set of isometries and compositions of isometries 


53 


12 


18 


10 


7 




1 0 Dilations 


45 


16 


21 


10 


8 


1 


15 Observati^... of shadows 'n the plane 


58 


14 


16 


7 


5 


0 


16 Properties of affine transformations 


74 


10 


10 


4 


3 


0 


17 Equations of affine transformations 


86 


5 


5 


3 


2 


0 


1 8 Equations of similarity transformations 


79 


6 


8 


4 


2 


0 


19 Equations of symmetry with respect to the 














Cartesian axes or the origin 


73 


9 


10 


5 


3 


0 


20 Drawing in perspective 


84 


8 


4 


3 


1 


0 


21 Deformed images 


85 


8 


4 


2 


1 


0 


CORRESPC DRNCES AND STRUCTURAL ANALOGIES 






04 Search and discovery of structural analogies 


55 


9 


11 


9 


14 


2 



[D] 



[£] 



[FI 



[C] 
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fable 3 

Topics in the Syllauus of Intermedlale Emphasis (Optional Program) 



Topic 



Frequencies 
0 12 3 



GEOMETRY THE FIRST REPRESENTATION OF THE PHYSICAL WORLD [A] 



13 Inscribed and circuroscribed polygons 


1 


12 


32 


30 


23 


3 


1 9 Lines tangent to a circle 


4 


36 


37 


13 


9 


1 


20 Inscribed angles and central anyles 


4 


26 


43 


16 


10 


1 


NUMERICAL 


SETS 












11 Percentages 


1 


8 


28 


31 


27 


4 


1 6 Representation of rationals on the number Pne 


5 


27 


28 


15 


23 


2 


1 7 Decimal form of rational numbers 


2 


14 


35 


26 


21 


2 


18 Terminating decimals and repeating decimals 


2 


10 


37 


30 


20 


1 


25 Rules for extracting a square root 


9 


15 


29 


26 


18 


2 


26 Methods of approximation of a square root 


4 


15 


36 


24 


19 


3 



MATHEMATICS OF CERTAINTY AND MATHEMATICS OF THE PROBABLE 



01 True-false statements and probable statements 


15 


25 


30 


^^ 


13 


05 Statistical observation 


17 


16 


29 


21 


13 


06 Pie charts 


8 


21 


32 


22 


16 


07 Pictograms 


9 


26 


31 


18 


15 


08 Histograms 


7 


21 


30 


23 


17 


1 2 Percentages 


7 


11 


30 


28 


21 


1 3 Simple arithmetic mean 


11 


29 


33 


16 


10 



PROBLEMS AND EQUATIONS 
COORDINATE GEOMETRY 



03 Coordinates of a point on a line 

1 1 Eqiiation of a line through the origin 

1 2 Equation of a line parallel to an axis 

1 3 General equation of a line 



GEOMETRIC TRANSFORMATIONS 



01 Measure of angles 


0 


9 


31 


33 


24 


04 Sum of internal angles and external angles 












of a triangle 


1 


19 


41 


23 


15 


05 Rigid motions in the plane 


24 


17 


27 


18 


12 


06 Translations 


25 


:6 


28 


17 


12 


07 Rotations 


20 


16 


29 


18 


15 


08 Symmetries 


16 


18 


31 


19 


15 


11 Similarity 


10 


8 


26 


32 


23 


1 2 Properties of similar figures 


10 


8 


26 


34 


21 


1 3 Relationship of areas of similar figures 


12 


9 


30 


29 


19 


14 Scale drawing 


11 


16 


30 


24 


17 



CORRESPONDENCES AND STRUCTURAL ANALOGIES 
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01 Concept of relation 

02 Concept of correspondence 



16 
13 



17 25 18 
15 28 20 



20 
21 



3 
3 
2 
2 
3 
3 
1 



3 


16 


28 


23 


26 


4 


10 


11 


29 


21 


26 


4 


17 


13 


30 


18 


20 


3 


16 


11 


30 


18 


23 


2 



2 
2 
2 
2 
2 
1 
1 
1 
2 



5 
5 



[B] 



[C] 



[D] 
[E] 



[F] 



[C] 



This situation is supported by the results of stu- 
dents: 60'70percent of the students answered correcdy 
items which referred to topics included in the core 
program, whereas only 25-30 percent rf coxlents suc- 
ceeded on items related to topics elimi lated from the 
official syllabus. Responses during the administration 
of the trial test of 450 students gave a furthei confirma- 
tion of this: for each item they specified if Jiey had 
already studied this particular topic. ^X^e foimd the 
same kind of agreement with the me^n value of "oppor- 
tunity to learn" (OTL) variable. 

With FreudenthaPs criticisms in mind (Freuden- 
thal, 1975), particular attention has been reserved for 
the analysis of OTL. This variable was presented as the 
predic:ed percentage of students able to correcdy an- 
swer an item and, in this sense, as a measure of the 
teacher^sability to predict theachievementof students. 
But at the same time, as a curriculum indicator, assum- 
ing that the time spent in the classroom is positively 
correlated with student achievement. To check the 
first interpretation of the variable — measure of ability 
to predict theachievement of students — for each item 
of the test, we studied the contingency table between 
the OTL expressed by the 140 teachers, and the mean 
score obtained on the item by the entire class of each 
particular teacher, and we calculated the value of chi- 
square But we found that the value of significance of 
chi-square depended on the item» so we had to find 
another way to categorize them. Items referring to 
topics in tfie core program show a dependence between 
the achievement of the class and the OTL expressed by 
the teacher, whereas items referring to excluded or op- 
tional topics or items with hidden difficulties or poorly 
formulated items present the achievement of the class 
independent of the teacher prediction. 

I must also mention another statistical aspect of 
the OTL. For each item, there is a strong stability of 
mean for the OTL in both independent samples of 
teachers: the 1300 teachers interviewed in 1985 and 
the 140 teachers of the classes tested in 1986. 

In order to classify the educational options indi- 
cated by the list, we tried to reduce the number of 
variables in play. This analysis of the information 
relative to the curriculum actually covered consists in 
a factor analysis of the relevance variable. 

The Achievement of Students 

The test used in the VAMIO research did not try 
to propose criteria for evaluating die quality of the in- 
novation actually realized. In fact, as much as possible, 
it tried to avoid proposing a particular interpretation of 
the syllabus. 



levels of preparation of students in different parts of the 
syllabus and this has been a check on the information 
collected dirough the teacher questionnaire. The re- 
sults amply confirm what had already emerged from the 
analysis of the syllabus. But the analysis of errors and 
the factorial analysis of the test also suggested some 
didactical problems: for example, it soems that the 
ability tv.. ead and interpret a statistical diagram is 
independent of the ability to work in the Cartesian 
plane. 

Different parts of the program are not well inte- 
grated. Due to the fact that some parts are considered as 
optional, we found that if one part is well-developed by 
a particular teacher, another is less so and vice- versa. 
Forexample, two items which refer tonumerical ability 
(the first concerned with the stnictural properties of 
numerical sets while the second referred to theapproxi- 
mate result of a multiplication) correlate in opposite 
ways with the same factor. It seems that the diverse 
nature of the items isamplified by thedifferentdidactic 
options which are compatible in the same program. 

Test results also demonstrated the existence of sig- 
nificant differences among students from different 
geographical regions . Students from the more industri- 
alized and wealthier North scored higher than their 
coijnterparts in die South. Thisfact, which wasalready 
evident in previous lEA studies (Laeng, 1977; Visal- 
berghi, 1977)» Sijggested a further analysis of data re- 
garding the implemented curriculum. Comparing the 
mean value of the ratings of the factors, diose related to 
the most innovative topics rated higjier in the North, 
while those related to traditional topics rated higher in 
the South. 

On the basis of this experience, I think that: 

1 . it is possible to survey the 
implementation of a centralized 
program using cheap and quick 
instruments for collecting data 
direcdy from teachers; 

2. in our particular situation in 

Italy, we must gather and analyze more 
information about the achievement of 
large samples of students; 

3. it is not possible to introduce 
innovation simply by writing 
good syllabuses. We need, to 
consider die entire educational 
process, in order to control trends 
of interpretation, attitudes of 
teachers, and die achievement of 
students. 



A qualitative analysis of each item and of its statis- 
tical characteristics allowed us to discover different 
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4. the actual national program 
constitutes a conceptually ad- 
vanced proposal not yet fully 
developed or actuated. More 
energy (time and money) must be 
invested in in^sftrvice teacher 
training, development of educa- 
tional materials, and research. 
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THE 1987 APU SURVEYS: SOME PRELIMINARY RESULTS 



Derek Foxwian • Graham Ruddock 



The 6tl\ national mathematic!> monitoringsurveys 
of 1 1 - and l^-year-olds in England, Wales, and North- 
em Ireland were carried out in 1987 by the National 
Foundation for Educauonal Research in England and 
Wales, on behalf of the Assessment of Performance 
Unit at the Department of Education and Science 
(DES) in Britain. As in the previous phase of annual 
surveys from 1978 to 1982, a light sampling technique 
was used in which each pupil involved took only a 
fraction of the total assessments used. Modes of assess- 
ment relating to new technology and small group 
problem solving were included for the first time as well 
as the modes used in previous surveys. Some initial 
analyses of the age 1 1 survey data have been carried out 
which show a similar picture to 1982 in the pattern of 
results with some shifts in detail. 

Surveys of the mathematical performance of pupils 
in the 1 1-and 15- year old age groups in the schools of 
England, WalesandNorthemlrelandare being carried 
out by the National Foundation for Educational Re- 
search inEngland and Wales (NFER). TheNFER is an 
independent research body funded mainly by the Local 
Education Authorities in England and Wales and by 
outside sponsors. It undertake ^ research and develop- 
ment projects on issues of current interest in all sectors 
of the public education system. 

The moni toring surveys are conducted on behalf of 
The Assessment of Performance Unit (APU) at tfie 
DES and are funded by the DES, The Welsh Office 
Education Department, and the Department of Educa- 
tion in Northern Ireland. In the APU's monitoring 
programme the NFER has also undertaken surveys in 
English Language and First Foreign Language, work in 
science is based at the University of Leeds and Kings 
College, London University, while Goldsmiths Col- 
lege, Ixjndon University, is responsible for Design and 
Technology. 

The research teams* wo'^k is guided by steering 
groups consisting of teachers, education authority cur- 
riculum advisers, educational researchers and members 
of Her Majesty*s Inspectorate (HMI). The work of 
steering groupsis supervised by a management team of 
the A PU consisti ng of ^ smal 1 n umber of adm i n istra tors 
and HMI. 



changes in performance. 

Six mathematics surveys of each age group, 1 1 and 
15, have been mounted. Both groups were surveyed 
annually from 1978 to 1982, and a further survey was 
carried out in 1987. Surveys of 1 1 year olds are carried 
out in May, with those for the older pupils taking place 
in November. Age 11 represents the lastyearof primary 
schooling, while the 15 year olds, in November, are at 
the beginning of their last year of compulsory school- 
ing. 

The Mathematics Assessment Framework 

The assessment framework on which the mathe- 
matics surveys are based can be seen as having three 
dimensions: 

Content. The mathemarical content is covered by 
five main categories: number, measures, geometry, 
algebra, probability and staristics. Each of these catego- 
ries is further divided into over a dozen sub-categories 
in toto for detailed monitoring purposes. This division 
into sub-categories differs for the two agegroups; number 
is, for example; represented by a greater number of 
divisions at age 1 1 , than at age 15 while for algebra the 
reverse is true. 

Context. The contexts in which the mathemarics 
is placed includes everyday life, other school subjects, 
such as geography, and Aat of mathematics itself. 

Learning Outcome. Three broadformsof learning 
outcome are assessed: the understanding of concepts 
and performance of routine skills; using problem solv- 
ing strategies and atritudes to mathematics. 

Modes of Assessment: The 1987 Surveys 

In Ae 1987 surveys the assessment modes already 
developed in the 1978 to 1982 period were again used 
together with specially developed new modes which 
reflected trends in tlie mathematics curriculum since 
1982. 

The modes of assessment used from 1978 to 1982 
were: 
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The purpose of the surveys is to provide a national 
picture of the performance of pupils in the age groups 
concerned and, over a series of surveys, to monitor 



• Written tests of concepts and skills. Each test 
comprised around 50 short response iten s. Only a few 
of these were multiple-choice questions. Normally 
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three sub-categories of content were equally repre- 
sented in each test. Up to 30 such rests were used in a 
survey. 

* Written tests of problem solving strategies 
(Problems 6i Patterns): five c six problems were pre- 
sented in each test. A graded set of questions was 
presented for each. Between 8 and 10 tests were used in 
each survey. 

• Written attitude questionnaires: Sections on 
attitudes to mad^ematics in general and to particular 
topics were presented. The scales related to the enjoy- 
ment, usefulness and util ity of mathematics and mathe- 
matical topics. A free response section was also in- 
cluded. 

Concepts and skills, problem solving strategies, 
and attitudes were also assessed in practical tests in an 
oral mode given in a 1-to-l interview situation by ex- 
perienced teachers of the age group, recruited and 
trained specially for a survey. The training aimed to 
produce a high degree of standardisation of presenta- 
tion, butwith some flexibility allowed in a friendly, but 
scarchingatmosphere. Assessors worked froma "script" 
containing all the questions to be put to pupils and 
directions on the manner in which the materials to be 
used should be presented. Prompts or hints were given 
if prescribed by the script. Flexibility was provided by 
the freedom given to assessors to ask for clarification of 
a response in a neutral way whether scripted or not. For 
example, pupils could be asked how they obtained an 
answer whether the response is correct or incorrect. 
Each pupil was given about three topics from up to 15 
used in a survey. Assessors recorded :n as much detail 
as was practicable what they and thepnpilssaid and did. 



The overall bahr assessments in 1987 as 
compared with 1982 bnifted towards more problem 
solvingand mathematics incontextand the greater use 
of new technology, calculators, and microcomputers. 

A larger number of Problems and Patterns and 
fewer Concepts and Skills tests were used than in 1982. 
Within each as^ssment mode the role of the calculator 
was increased. A new number sub-category, Calculator 
Skills, was introduced into the Concepts and Skills 
assessments, th e ex isti ng sub-categories re ma i n i ng n on- 
calculator based. In 1978 to 1982 calculators had not 
been allowed in any of the Problems & Patterns tests 
whereas in 1987 they were allowed in about one-third 
of them. In addition, more of the topicsin the Practical 
Tests were calculator based. All of the collections of 
items were reviewed, revised and updated. 

New assessment modes were also introduced for 
1987: 

• Written Tlieme Tests: In the theme test: a uni- 
fying context, such as the weather, or planning a trip, 
was provided to produce a meaningful setting for the 
tasks. These consisted of short response items together 
with a final task requiring integration of a range of in- 
formation and previous answers. Calculators were 
available. 

• Small Group Problem Solving Tasks: Groups 
of three pupils of die same sex and similar attainment 
level workc^ together on problem solving tasks with an 
assessor recording the activities (Foxman, in press). 

• Mathematics with the Micro: Individual pupils 
undertook problem-solving activities on a BBC B or 



Tabic 1 

Structure of the 1987 Primary Mathematics Survey 



Concepts and Skills 

Written rests 
Problems and Patterns 

neme tests 
Calculator Skills rest 



Sample used 
Whole 5'Unple 

Sub-sample 1 



No. of Pupils 
10,000 

4,800 
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Altitude Questionnaire 

Practical rests 

Small Group Problem 
Solving^ 

Maths with the Micro^ 



Sub-sample 2' 

''/jb-sample3* 

Separate 
sample 

Non-random 
separate sample 



(270 .'jroups of 2) 



1,200 
1,200 
800 

150 



• Sub-samples 2 and 3 overlap 

^ These pupils also took a specially constructed Concepts and Skills test. 



93 



RML Nimbus microcoinputcr. An assessor recorded 
the activity on a l-tO'l basis. This was a probe, using 
a non-random sample, to illustrate the use of a micro- 
computer for such assessments. 

Survey design in 1987 

Stratified cluster sampling is used for all random 
samples. About 2 percent of the target population is 
sampled in England and about 6 percent in Wales and 
Northern Ireland. The survey designs for the two age 
groups were similar; thatfor theyounger pupils is shown 
in Table 1 by way of example. 

All pupils took a Concepts and Skills test, and 
most, in addition, took a ;>econd assessment; a few 11- 
year-olds took three. Since an overall picture of per- 
formance is required, each pupil took only a small 
proportion of the items in use, and made his or her 
contribution to the overall picture by so doing. A large 
and representative pool of items can thus be widely 
sampled. All assessrrients are taken anonymously. 
Schools in the sample are asked to complete question- 
naires about some aspect of their curriculum and their 
staffing resources. Other information is requested about 
the size of classes and their methods of organising them, 
especially matherr.atics classes. The amount of time 
spent on mathematics in school and doing homework 
is also asked for. Further information about schools' 
location, size and pupil- teacher ratio is obtained from 
the DES. 

Reporting 

The results are reported in a number of ways. For 
monitoring purposes reporting has been by sub-cate- 
gory for the Concepts and Skills tests and against some 
pupils and school variables. The reporting by topic or 



individual item, however, has been more valuable for 
teachingpurposes. A multi-level modellingprogramme 
is being used to analyse the 1987 data by background 
variables (Goldstein, 1986; Hutchison and Schagen, 
1987). 

The form of reporting for the 1 987 surveys has not 
yet been finalised. Previous reports have included 
individual reports on each of the first three annual 
surveys at each age level (Foxman et al. 1980, 1981, 
1982), and also a Review Report covering the findings 
of all annual surveys (Foxman et al. 1985). These 
reports cover every aspect of the survey and are not 
written for particular audiences. As the APU's pro- 
gramme has progressed, the emphasis in the research 
has shifted from an overriding interest in monitoring 
change to obtaining and disseminating information 
abouL pupils' performance, e^ecially in relation to age 
differences, gender differences and differences within 
70 percent attainment bands. The mathematics team 
has developed extensive coding of error responses to 
individual items, so the contrasts in performance relate 
to error and omission rates as well as facilities. 

This information is included in short reports writ- 
ten specially for teachers. These have taken two forms: 
one is booklets on topics such as Decimals (Mason & 
Ruddock, 1986), Practical Mathematics (Foxman, 
1987), and Attitudes and Gender Differences (Joffe & 
Foxman, 1988), and the Cockcroft Foundation List 
(Ruddock, 1988) and 4-page leaflets, mainly for pri- 
mary teachers which highlight the main findings in 
particular areas. It is likely that much of the reporting 
of the 1987 results will have implicationsfor teachers in 
mind. 



Table 2 

Comparing Decimals: Different Success Rates 



Item 1 

Which of the numbers below has the 
greatest value? 

% of pupils 
selecting response 



Item 2 

Which of t! J numbers below has the 
smallest value? 

. % of pupils 
selecting response 
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A. 0.075 


1 


A. 0.625 


34 


B. 0.09 


1 


B. 0.25 


3 


C. 0.1 


82 


C. 0.375 


2 


D. 0.089 


14 


D. 0.125 


37 






E. 0.5 


22 


Other 


1 


Other 


1 


Omit 


1 


Omit 


1 
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Results 

Initial analyses of most of the data from the survey 
of 1 1 -year-olds i n 1 987 have bee n carried out, but th ose 
from the age 15 survey are still incomplete. The ex- 
amples given below from 1987 therefore relate to the 
younger pupils only, in particiuar the discussion on 
calculator skills. Other examples relate to the APU's 
methods of presenting results, especially in documents 
for teachers. Since theovCi-allpatLemofresultsin 1987 
is very similar lo chat of 197 8 to 1 982 - some results from 
these surveys aic incluaed in tlie illustrations. In order 
to highlight factors wFiich influence success error and 
omission rates, there has been an emphasis on compar- 
ing the results of parallel items within and between 
assessment modes. 

Concepts and Skills 

Decimals. A feature of successive surveys hasbecn 
the development of items to explore or extend particu- 
lar findings obtained in previous surveys. For example, 
in the first age 15 survey in 1978 two written test items 
requiring pupils to compare decimals less than one 
obtained markedly different success rates. In response 
to Item 1 (Table2)82percentcorrectlychose0.1 asthe 
largest decimal while only 37 percent were successful in 
selecting the smallest decimal in Item 2. Furthermore, 
while there were two popular responses to Item 1 , there 
were three to Item 2. About 1000 pupils took each 
question illustrated in this section. 

After experimenting with furtlier items and inter- 
viewing pupils, these results were found to be due to two 
errors which we call "largest is smallest" or conversely, 
"smallest is largest" (LS error) and "decimal point 
ignored" (DPI error) respectively. For example, m Item 
2 the LS error makers choose alternative A since this is 
the largest number after the decimal point; the DPI 
pupils select response E since this is the smallest num- 



ber if the decimal points are ignored. The LS error was 
found to be largely unknown by teachers and other 
mathematics educators despite its high incidence at 
both age groups. 

In Table 3, the results for Item 2 are contrasted 
with those for 2 new items, Item 3 and Item 4. Item 3 
is the same as Item 2 except that an additional digit has 
been added to alternative C. Item 4 differs from Item 
2 in that \hc largest instead of the smallest decimal is 
required. These changes are sufficient to make dra- 
matic differences to the results. The correct and the 
two error responses are marked appropriately on the 
items in Table 3. 

In the case of Item 4, those pupils responding 
correctly and those who make the DPI error select the 
same response. The reason for Item 1 obtaining such a 
high facility is now seen to be due to the correct 
response being also selected by those pupils making the 
LS error. 



The surveys began in 1978 with totally separate 
Cbncepts and Skills item collections for the two age 
groups. When the results of the first surveys became 
known it was clear that there was a considerable over- 
lap in performance between them and so in later 
surveys, an incmsing number of items has been com- 
mon to them both. In 1987 Item 2, above, was included 
in the age 11 survey together with a parallel version 
placing die numbers in a context. The out of context 
results are very similar to other items of this type in 
previous age 1 1 surveys. 

These results show that the younger pupils success 
rate is about 25 percent below that of the 15-year-olds, 
uboutaverageforcheitemscommon to both age groups. 
The effect of context on success rate is negligible, but 
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Tabic 3 

Comparing decimals: The two main errors 



Item 2 



Which of the numbers 
below has the smallest 
value? 



A. 0.625 

B. 0.25 

C. 0.375 

D. 0.125 

E. 0.5 
Other 
Omit 



34% 
3% 
2% 
37% 
22% 
1% 
1% 



(LS) 



(Correct) 
(DPI) 



item 3 

Which of the rmmbers 
below has the smallest 
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Item 4 

Which of the numbers 
below has the largest 



value? 




value? 


Percent of pupils 




Sclcting space 


A. 0.625 4% 




A. 0.625 60% 


B. 0.25 2% 




B. 0.25 0% 


C. 0.375 36% 


(LS) 


C. 0.375 0% 


D. 0.125 43% 




D. 0.125 0% 


E. 0.5 13% 


(DPI) 


E. 0 5 33% 


Other 0% 




Other 5% 


Omit 2% 


Omit 


2% 



(Correct + DPI) 



(LS) 



Tabic 4 

1987 Age 11 Surveys: Comparing decimals in and out of context 





Item 5 




Item 6 




Which of the numbers below 


has the 


Which of the numbers below 


has the 




smallest value? 




smallest value? 




A. 0.625 


25% 


(LS) 


A. 0.625 12% 


(LS) 


B. 0.25 


2% 




B. 0.25 4% 




C. 0.375 


1% 




C. 0.375 2% 




D. 0.125 


12% 


Correct 


D. 0.125 10% 


Correct 


F. 0.5 


56% 


(DPI) 


E.0.5 71% 


(DPI) 



there is a shift in the balance of the incidences of the 
two en-ors from LS to DPI. This was anticipated from 
the results of si milar items for lower attaining 1 S-year- 
olds in another project carried out by theNFER for the 
DES(Foxman etal. 1988). 

The results of the Concepts and Skills tests are also 
reported in five 20 percent attainment bands and they 
reveal another aspect of the decimal results. 

For both items the LS error is made by more upper 
attainers than lower attainers, while the DPI error is a 
characteristic response of the lowest 40 percent. This 
was shown in previous surveys to bealso the case for 1 5- 
year-olds. The LS error is, therefore, more "advanced" 
than the DPI error. 

Calculator Skills in 1987 

Tests of calculator skills have been used in all 
previous age 15 surveys in the l-to-l practical assess- 
ment. At age 11, the only calculator test was in the 
1982 Primary Survey, so the picture of pupils' skills in 
this area has been considerably extended in 1987. Cal- 
culator use may be mandatory or optional. Ir» the latter 
case calculators arc made available for pupils co use, but 
it is up to them to decide if and when to use them. 



The 1987 APU primary mathematics survey con- 
tained examples of both types. In written tests it is 
difficult to make calculators mandatory but it is possible 
in the 1- to- 1 practical tests. There was c^.e topic in the 
practical testingwhich was specifically concemedwith 
calculator skills, and the assessor required the pupil to 
use the calculator to answer most of the questions put. 
The Calculator Skills topic included also the assess- 
ment of pupils' ability to approximate before calculat- 
ing and to decide whether an answer wps reasonable 
after doing so. In some other practical tests and in a 
number of the written Concepts and Skills, Theme and 
Problems and Patterns tests, calculators were available 

There are important differences between manda- 
tory and optional uses of a calculator in what is being 
assessed. In a calculator available situation the ability 
to make an efficient choice between the calculator as 
the most effective way of reaching an answer and 
mental or pencil and paper methods is an integral part 
of theasscssmenr. This means that pupils can avoid the 
use ofa calculator in situations where th'^y do notknow 
how to use it for a particular calculation or do not know 
how to interpret the result in the calculator display 
Such situations can be assessed in an interview where 
calculator use can be made mandatory. 



Tabic 5 

Attainment band analyses of decimals in and out of a context 



Ordering decimals: no context 



Ordering decimals: in a context 





Bottom 


Lower 


Middle 


Upper 


Top 


Bottom 


Lower 


Middle 


Upper 


Top 






Middle 




Middle 






Middle 




Middle 




20% 


20% 


20% 


20% 


20% 


20% 


20% 


20% 


20% 


20% 


Correct 


1% 


1% 


4% 


11% 


49% 


1% 


1% 


3% 


5% 


39% 


LS Error 


10% 


12% 


26% 


46% 


34% 


2% 


5% 


13% 


23% 


21% 


DPI Error 


77% 


81% 


66% 


37% 


12% 


84% 


92% 


80% 


61% 


33% 
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Two tasks, one from the practical tests and one 
from the written tests illustrate the difference in per- 
formance that was found to be associated with the 
calculator mandatory and calculator available modes. 

Practical Topic 

Calculator mandatory: "John spends £27.45 on 
shopping. His bus fere was 60p and a meal at a caf6 cost 
£3.85. How much did hz spend altogether?*' (n = 3 10 
pupils) 

The correct answer, £31 .90, was given by 28 per- 
cent of pupils, but the most common response (32 per- 
cent) consisted variations on the digit string 91.3. 
Evidence on methods used by pupils have usually to be 
inferred from the answer given in a written test but a 
strength of practical assessment in the APU survey is 
the direct recording of method by the assessor present. 
For the question above, 20 percent of pupils worked in 
pence, and 37 percent entered 27.45 + 60 +3.85 ob- 
taining the digit string 91.3. In a written test when a 
calculator is available an item with smaller numbers 
produced a very different pattern of responses. 

Calculator Skills Written Test 

Calculator available: What is the cost of this 
shopping trip: 
Bus fare 90p 

Hamburger and coke £1.15 
Shopping £4. 25 

(n = 392 pupils) 

In this mode the correct answer £6 .30 was given by 
66 percent of pupils, with variations on 9.54 given by 
only 7 percent. The success rate was thus rather higher 
than those in the practical topics, and the proportion of 
pupilsobtaining95.4on the calculator, byentering90p 
as 90 rather than 0.9, and then using these digits in the 
answer given, 7% of pupils, is much lower. The differ- 
ence between these results and those from the practical 
survey, where calculators arc known lo have been used, 
can be accounted for in several ways. Choosing not to 
use the calculator may be one factor, or using it and 



then rejecting the answer in favour of a later one 
obtained with or without the calculator is another. 

These data support the view obtained from teacher 
ratings of frequtracy of calculator use that die British 
population of 1 1 -year olds in 1987 was relatively naive 
in terms of calculator experience. When a calculator 
was mandatory for a calculation, as in the practical 
topics, mixed units were a large scale problem and 
interpreting the displayed answer(13.2) to £66+5 pro- 
duced a range of responses such as 13.2 (39 percent of 
pupils), £13 and 2 pence (16 percent) and 132 (14 
percent). Only 18 percent immediately responded 
with £13.20. 

Dealing with mixed units and interpreting the dis- 
play are essential calculator skills, but not widely mas- 
tered by 1 1 year olds in 1987. 

Comparisons between response patterns to die 
same task in non-calculator and calculator available 
wntten tests were also carried out. 

Assessment with and without a calculator for the 
same task: The apparendy simple task of calculating 
6.25- 4 provides some interesting points for discussion. 

Without a calculator the item tests both algo- 
rithmic competence and place value. When a calcula- 
tor is used, the task should be a straightforward data 
entry exercise, but did not produce the success rate of 
over 90 percent expected from such tasks. Again, it 
seems likely that some pupils incorrecdy judged that a 
non-calcblator computation was the best method for 
them. The choice of when to opt for calculator-based 
computation rather than pencil and paper or mental 
working is a crucial one, and an aspect of calculator use 
which these data suggest needs further investigation. 
Items like the one above, which may appear decep- 
tively easy, can be useful in this respect. 

The same task given without a calculator being 
available produced a range of differences both in suc- 
cess rate and in response pattern. Apart from tasks 
which are straightforward data entry exercises with a 



Table 6 

Comparison of non-calculator and calculator available tasks 



Responses given 



6-25-4= 2.25 (Con-ect) 225 6.21 621 Other Omit 



Non-calculator 31% - 40% 3% 13% 13% 

Calculator available 66% 4% 17% 2% 10% 1% 

Non-calculator n=l(XX) 

Q Calculator available n= 392 



calculator, but need awkward algorithms without one, 
two basic types were found: 

• tasks where calculator use produces a 
higher success rate; 

• tasks where calculator use produces 
similer or lower success rate. 

Tasks where success rates were found not to be 
higher when a calculator was provided can be summa- 
rised as those where finding an appropriate method is 
the difficulty rather than the computational or algo- 
rithmic problems. For British 1 1 -year olds such topics 
as rate and ratio and percentages showed this pattern. 

Problem'Solving Strategies 

Problem-solving strategics were tested in 1987 in 
the Problems and Patterns written tests, in some of the 
1 -to- 1 practical topics, and in the small group problem 
solving. In the Problems and Patterns tests there are 
usually graded questions on five situations involving 
component problem-solving strategics, for example, 
continuing patterns, general ising them, and explaining 
how the/ work, working systematically, using trial and 
error methods, and so on. 

in one example a sub traction is presented wi th two 
missing figures 

50 
2 7 

Pupils arc first asked to supply one set of figures 
which will make the subtraction correct and then other 
possible answers. A similar problem follows which has 
six correct answers. 

At age 1 1 in 1987, 22 percent of the pupils ob- 
tained all 6correctanswers,a sllghdy lower figure than 
that in the 1982 survey. In the 1982 survey of 15-year- 
olds 60 percent of them obtai ned al I six correct answers. 
A higher proportion of the older pupils used systematic 
working to get their answers. 

In general there is much more of a requirement for 
pupils to explain findings and to record their working 
than in the Concepts and Skills tests. In the above 
example pupils usually supplied sufficient evidence to 
judge whether their working was systematic, but in 
most situations it is very difficult to get them to record 
spontaneously. The advantage of the 1-to-l practical 
tests is that pupils can be observed and can be asked 
about their methods of working. The l-to-l problem 
solving tCbts have included both mathematical situ- 
ations and "everyday" problems such as arranging '\ 
Class Trip, organising a Birthday Partv (1 1-year olds), 
or Designing a Kitchen (15-year-olds). 
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With the advent of the small group assessment 
situation in 1987 the opportunity was taken toattempt 
some cross modal comparisons. One topic. Number 
Chains, was tried out in a written Problems and Pat- 
terns test and al so in both the 1-to-l and small group as- 
sessments. Tlie situation involved applying a rule to 
whole numbers, "If its even, halve it; and if its odd, add 
3 The effect of applying this rule to a number and then 
to the result of the transformation successively is to 
form a number chain e g. 15 18 9 12 6 3 etc All 
chains end in one of two loops, 6 3 or 4 2 1 . In the 
1-to-l practical and the small group assessments the 
pupils were given plenty of opportunity to derive the 
ruie themselvcsand to test out any conjectures they had 
made about what it could be before being presented 
with the substantive problem. This was to find out 
what sort of numbers end in 6 3 and what end in 4 
2 1 . The results show clearly the value of both pupil- 
teacher and pupil-pupi' discussion in problems which 
are wi thin the capacity of most 1 1 -year olds to solve, as 
compared with a printed textbook presentation. Over 
a third obtained the correct answer in the small group 
assessment, with no help from an adul t, compared with 
a quarter in the 1-to-l who obtained it with a litde or 
no help and only 1 percent in the written test version. 
In the small group an additional 13 percent got most of 
the way towards a correct solution without help and in 
the 1-to-l practical a further 6 percent obtained the 
correct answer with a lot of help. 

Attitudes and Gender Differences 

In previous surveys pupils' attitudes have been 
studied by means of written questionnaires and by 
assessors' observations of pupils' responses to the 1 -to- 
1 practical tests. In 1987 pupils' views were additionally 
sought on all written tests. 

Theattitudequestionnairesinvestigate pupils' feel- 
ings towards mathematics generally and to individual 
mathematics topics on scales relating to enjoyment, 
utility, and difficulty. The results follow a similar 
pattern to that observed in other attitude surveys con- 
ducted both in the UK and in other countries: pupils 
find mathematics lessenjoyable,moredifificult, and less 
useful as they get older. More boys that girls perceived 
mathematics as being relevant to their futures, enjoy- 
able and "one of their better subjects " Although boys 
and girls liked mathematics as a subject, the most 
frequently mentioned reason for disliking it was that it 
was too difricult, a reason provided by more girls than 
boys. More boyb than girls thought that " without 
maths our lives would be harder," that it is difficult to 
get on in life if "you haven't done much maths," and 
tnat it would help them to gel a job one day. 

!n respect of performance, the surveys have consis- 
tcndy shown that gender differences across Concepts 
and Skills lopic areas (computation- measurenenr top- 



ics) are as great at age 1 1 as they are at age 15. ITieyare 
also larger than the differences within topics which 
develop in favour of boys during secondary schooling. 
Most importantly of oil, perhaps, is that nearly all the 
differences in performance between boys and girls are 
accounted for by the top 1 0 to 20 percent of attai ners in 
most areas of mathematics at both ages. Thus, all the 

importantgenderdifferencesarewellestablishedbythe 
age of 11 in Britain. 

Recendy some more encouraging results for the 
girls have appeared. In the Problems and Patterns tests 
in 1982, girls were very slightly ahead at age 1 1 and 
even more so at age 15: a reversal of the trend obtained 
in Concepts and Skills. 

Summary and Conclusions 

The most important aspect of the project has been 
the development of new assessments and the breadth 
and richness of information provided by pupils' per- 
formance at the two age groups tested. A wide range of 
assessment modes has been employed including practi- 
cal mathematics, problem solving in small groups, and 
mathematics on the micro. Calculator and mental 
skills have also been explored. 

The 1 987 age 1 1 survey results so far to hand show 
that the overall pattern of results is similar to that 
previously found in the phase of annual surveys from 
1978 tol982. Ingeneral, the Concepts and Skillsand 
Problems and Patterns mean scores in 1987 are similar 
to 1982, but there are differences in detail: an upwards 
move in the mean scores of spatial sub-categories and 
one downwarus in number sub-categories. There is 
evidence that these are probably linked to differences 
in emphasis in the madiematics cumculum, but that 
link has sti 11 to be established. An important feature of 
the work remaining will be to disseminate to teachers, 
by reports and other means, the results and their impli- 
cations for teaching. 
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Every four years, the National Assessment of 
Educational Progress (N AEP) gathers information about 
the mathematics performance of students in the United 
States at the elementary, middle, and high school lev- 
els. NAEP mathematics assessments were conducted 
duringthe school yearsending in 1973, 1978, and 1982. 
The fourth and most recent assessment was conducted 
in 1986 with a nationally representative sample of ap- 
proximately 35,000 9-year old, D-year old. and 17- 
year old students. 

The objectives that guided thedevelopment of the 
fourth mathematics assessment covered seven broad 
content areas: 

1. Fundamental Methods of Mathematics 

2. Discrete Mathematics 

3. Data Organization and Interpretation 

4. Measurement 

5. Geometry 

6. Relations, Functions, and Algebraic 
Expressions 

7. Numbers and Operat'ons 

Each of these content areas was assessed at five 
process levels: understanding/comprehension, 
knowledge, skill, routine application, and problem 
solving. 

From its inception, NAEP has developed assess- 
ments through a consensus process. Theobjecdves that 
provided a framework for the fourth mathematics as- 
sessment were written and reviewed by a panel of 
mathematics educators, including classroom teachers. 
The objectives focused on content that should have 
been covered by a majority of students at a given grade 
level. 

The first three assessments A'ere conducted by the 
Education Commission of theStates, whereas the fourth 
mathematics assessment was conducted by the Educa- 
tional Testing Service. Usingtheframework of content 
categories and process levels oudined above, a group of 
mathematics educators worked with tfie staff of the 
Educational Testing Service to develop the items for 
the fourth mathematics assessment. The i tems were ex- 
tensively reviewed by subject-matter and measurement 
specialists. A set of unreleased items from previous 
assessments was included in the fourth assessment to 
provide continuity and to establish a basis for measur- 



ing change in performance from previous assessments. 
The items were field- tested, revised, and administered 
to a stratified, multi-stage probability sample. 

Some changes in methodology accompanied the 
change to the Educational Testing Service as the 
administrator of the assessment. In the fourth assess- 
ment, subjects were selected by grade level rather than 
by age. Matrix sampling procedures were used to iden- 
tify a representative national sample of third-grade, 
seventh-grade, and eleventh-grade students. 

There were also some changes in the actual ad- 
ministration of the assessment. Items from the previous 
assessment frequently were open-ended. The newly 
developed items were all multiple choice. In previous 
assessments, a paced audio recording was used to read 
each item to the students. Thus, die time allotted for 
each item was controlled. In this assessment, test items 
were divided into blocks of approximately 15 minutes 
each. Each student was administered a booklet con- 
taini ng three blocks of cogni ti ve i tems and a six -minute 
background questionnaire. In order to provide broad 
cove rage of topics, item-sampling procedures were used 
as in earlier assessments. Each student received ap- 
proximately 10 to 15 percent of the items administered 
at each grade level. Approximately 2,000 students 
started each block of items, but because of the time limit 
somestudentsdidnotcompletealltheitems.Th as, per- 
formance on individual items that appear toward the 
end of a block is more difficult to interpret. 

Previous madiemadcsassessmentsresultshavebeen 
reported on an item-by-item basis which has proven 
particularly usefril to researchers, curriculum develq}- 
ers, and teachers. This practice was continued for the 
fourth assessment. The item-level results appear in 
companion articles in Arithmetic Teacher (Kouba et 
al., 1988a, 1988b) and in die Uothmatlcs Teacher 
(Brown et al., 1988a, 1988b) and in a momjraph 
published by the National Council of Teachers of 
Madiematics (Lindquist et al. in press). Aldiough 
item-level reporting of resuluihasallowedforareasona- 
bly clearand detailed description of the current level of 
performance of students in the United States, die level 
of detail makes i t difficul t to draw broad general conclu- 
sions abou t ove ral I perfor ma nee and ho w i t has cha nged 
over time. Iri the past, NAEP has attempted to provide 
some summary of the data by aggregating them over 
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content areas or prcxess levels at each grade level. This 
procedure has not proved to be entirely satisfactory 
because the aggregated scores have little meaning. For 
the fourth assessment, NAEP constructed scales in 
order to provide a profile of performance trends. 

Performance Scales 

In order to report performance trends across the as- 
sessments, a supplemental sample of subjects was se- 
lected by age rather than grade level and administered 
previously assessed items according to the procedures 
used in prior assessments. The item pool consisted of all 
items given in 1986 and in at least one of the previous 
two assessments. The total number of items included in 
the trends assessmen t was 68 i tems for 9-year olds, 98 for 
13'year olds, and 94 for 17-year olds. The responses 
were scored, weighted in accordance with the popula- 
tion structure and adjusted for nonrcsponse. Item Re- 
sponse Theory (IRT) technology was used to estimate 
levels of mathematics achievement for the nation and 
for various subpopulations along a single scale. 



With IRT it is possible to summarize the perform- 
ance of a sample of students on a single scale, even if 
different students were administered different items. 
Using scaling techniques, NAEP was able to identify 
items that had similar statistical properties and use 
those items to define different levels of performance. 
These levels of performance were then used to gener- 
ate, for different assessments and for different subpopu- 
lations, numerical scores that have criterion-referenced 
interpretations. That is, certain scores can be related to 
the attainment of certain skills on a hypothesized 
continuum of proficiency, and these scores can be used 
to dcocribe the performance of different ages and sub- 
groups based on a common standard. A more complete 
description of the scaling procedures can be found in 
The Mathematics Report Card (Dossey, Mullis, Lind- 
quist, & Chambers, 1988). 

IRT scales have a linear indeterminacy which may 
be resolved by an arbitrary choice of the origin and unit- 
size in each given subscale. The mathematics scale was 
linearly transformed so that the final scale would have 
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Level 150 - Simple Arithmetic Facts 



Table 1 

Levels of Mathematical Proficiency 



Learners at this level know some basic addition and subtraction facts and can add two-digit numbers 
without regrouping. They recognize simple situations in which addition and subtraction apply. They also 
are developing rudimentary classification skills. 

Level 200 - Beginning Skills and Understanding 

Lcarneia at this level have considerable understanding of two-digit numbers. Thoy can add two-digit 
nurrbcrs, but are still developing an ability to regroup in subtraction. They know relations among coins, 
can read information from charts and graphs, and use simple measurement instruments. They are 
developing some reasoning skills. 

Level 250 - Basic Operations and Problem Solving 

Learners at this level have an initial understanding of the four basic operations. They are able to add and 
subn-act whole numbers and apply these skills to one-step word problems and money situations. In multi- 
plication, they can find the product of a two-digit and a one-digit number. They can also compare 
information from graphs and charts and are developing an ability to analyze simple logical relations. 

Level 300 - Moderately Complex Procedures and Reasoning 

Learners at this level are developing an understanding of number systems. They can compute with 
decimals, simple fractions, and commonly encountered percents. They can identify geometric figures, 
measure lengths and angles, and calculate areas of rectangles. These students are also able to interpret 
simple inequalities, evaluate formulas, and solve simple linear equations. They can find averages, make 
decisions on information drawn from graphs, and use logical reasoning to solve problems. They are 
developing the skills to operate with signed numbers, exponents, and square roots. 

Level 350 - Multi-step Problem Solving and Algebra 

Learners at this level can apply a range of reasoning skills to solve multi-step problems. They can solve 
rounne problems involving fractions and percents, recognize properties of basic geometric figures, and 
work with exponents and square roots. They can solve a variety of two-step problems using vanables, 
identify equivalent algebraic expressions, and solve linear equations and inequalities. They are develop- 
O ^ ing an understanding of functions and coordinate systems. 
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Tabic 2 

Percent of Students at or Above the Five Proficiency 
Levels 



Proficiency Level 




Age 






9 


13 


17 


150 


98 


100 


100 


200 


74 


99 


99 


250 


21 


73 


96 


300 


1 


16 


51 


350 


0 


0 


6 



Table 3 

Average mathematics proficiency levels: 1973-1986 



Proficiency Level 








Age 


1973 


1978 


1982 


1986 


9 


(219.1) 


218.6(0.8)* 


219.0(1.1) 


221.7(1.0) 


13 


(266.0) 


264.1(1.1)* 


268.6(1.1) 


269.0(1.2) 


17 


(304.4) 


300.4(0.9) 


298.5(0.9)* 


302.0(0.9) 



*Statistically significant difference from 1986 at the 0.05 
level. 

Jackknifed standard errors are presented in parentheses. 
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a weighted mean of 250.5 and a weighted standard 
deviation of 50 across all students in the three ages. An 
additional benefit of IRT methodology is that it pro- 
vides for a criterion -referenced interpretation of levels 
on this continuum of proficiency. Although the profi- 
ciency scale ranges from 0 to 500, few items fell at the 
ends of the continuum. Thus, five levels of proficiency, 
rangingfrom 1 50 to350, werechosen for describing the 
results. Each level is defined by describing the types of 
mathematics questions that most students attaining 
that proficiency level would be able to perform success- 
fully. The levels are described i n Table 1 . The estimated 
proportion of each age level at or above each of the five 
proficiency levels is reported in Table 2. 

It is important to note that since the items in 
the NAEP pool were not developed to conform to 
some hypothesized framework of levels of mathemati- 
cal proficiency, and since the proficiency levels were 
derived in a post hoc analysis of performance, these 
levels do not represent an idealized picture of mathe- 
matical proficiency. Further, if is certainly possible to 
define hypothetical levels of mathematical profi- 
ciency beyond those identified here. Th'. reported 
levels are merely those that emerged from the 
statistical scaling of the available pool of items. Cau- 
tion is urged in interpreting the results based on 
these levels. Although statistically coherent, the 
available items do not necessarily fall into clearly 
defined clusters of related items. 

National Trends 

Average mathematics proficiency levels for each 
age group for the four mathematics assessments are 
given in Table 3. Significantgainshave been observed 
for all age levels over time. (The proficiency levels 
reported for 1973 reflects a rough estimate of extrapo- 
lated results based on previously reported NAEP data.) 

Performance of 9-year olds, which had shown litde 
change from 1^73 to 1982, improved significantly 
between 1982 and 1986. Performance of 13-year olds, 
which had increased in the late 1970's and early 1980's, 
leveled off between the last two assessments, registering 
virtually no change from 1982 and 1986. For 17-year 



olds, the downward trend that had been characteristic 
of peiformance in the 1970's was reversed. Seventeen- 
year olds made significant gains between 1982 and 
1986. The gains for 17-year olds parallel the gains that 
V ere made by the same age cohort group between 1 978 
and 1982. Although the same cohort pattern is not 
reflected in the results for 9-year olds, the relationship 
atages 13and 17 suggest that the causes underlying the 
recent improvements at age 17 extend beyond recent 
reforms being made in high school graduation rcquire- 
men ts. These performance trends as depicted by N AEP 
are pictured in Figure 1. 



Trends among Minorities 

Over the last decade. Black and Hispanic students 
have made significant gains in achievement in mathe- 
maticsat all grade levels. Blackstudentsatall three ages 
have shown steady and significant gains across the past 
three assessments. Hispanic students at ages 9 and 17 
have shown steady improvement over the past three 
assessments. At age 13, there was litde change in 
performance between 1973 and 1978; however per- 
formance improved significandy from 1978 to 1986. In 
general, the gains of Black and Hispanic students have 
been greater and more consistent than the gains shown 
by White students. Nevertheless, although the gzp 
between the performance of White students and the 
performance of Black and Hispanic students is narrow- 
ing, performance differences among these minority 
subpopulations remain significant at all three age lev- 
els. The gains suggest that programs implemented over 
the last ten to twenty years to improve the performance 
of minority students are having an effect, but even 
greater efforts are needed to provide t^al equity of edu- 
cational opportunity for all American students. 



Trends by Gender 

Previous assessments have found few gender- re- 
lated differences in mathematics ach ievement at ages 9 
and 13, but at age 17, there have been small yet 
significant differences with males scoring higher than 
females. The same pattern x:currcd in 1986. Although 



102 



females. Thesame pattern occurred in 1986. Although 
there were no achievement differences at the youngest 
age level, more males than females obtained a profi 
ciency level at or above Level 250 among 13- and 1 7- 
year olds. Differences were particularly evident among 
^3'year olds at Level 300, and among 17'year olds at 
Levels 300 and 350. 

N AEP results also showed a significant advantage 
for males on geometry and measurement at Grades 3 
and 1 1 . Females tended to outperform males in the area 
of knowledge and skills while males showed a consis- 
tent advantage in the area of higher-level applications. 
There were no gender differences on tht algebra sub- 
scale. As early as age 13, significantly more males dian 
females responded that they were likely to enter a 
career that used mathematics, and more males than 
females responded that they wero good at mathematics. 



Performance Patterns 

Mo^t critical than the changes in students' per- 
formance overtimearethe patterns of curren t achieve- 
ment. Only 2 1 percent of the9-yearolds mastered basic 
mathematical operations and beginning problem sol v- 
ingskills (Level 250) thatare usually taught in elemen- 
tary school. One-fourth of them failed to demonstrate 
even beginning skills and understanding characterized 
by the next lower level of proficiency (Level 200). 
Amongthp 13 -year olds, only 16 percent demonstrated 
a grasp of moderately compKix mathematical proce- 
dures and reasoning (Level 300) generally embedded 
throughout the middle and junior high ^hool curricu- 



lum in Ae United States. About one-half of the 1 7-year 
olds reached this level which can be characterized as 
being able to use moderately complex numerical proce- 
dures and to interpret simple inequalities, evaluate 
formulas, and solve simple linear equations. Less than 
7 percent of the 17-year olds displayed abilities in 
multi-step problem solving and algebra. Closer exami- 
nation of the results reveals that most of the progress 
diat has occurred over the past eight years is in die 
domain of lower-order skills. 

Overall, third-grade students pe. formed well on 
selected whole number computation items but many 
appeared to lack mastery of place value and seemed to 
be learning mathematical skills at a rote manipulation 
level. About one-diird of die seventh-grade students 
and one-fourth of the eleven di-grade students demon- 
strated extremely limited knowledge of some of the 
most basic mathematical conceptsand skills. Aldiough 
diey could perform simple whole number calculations, 
they gave litde evidence of knowledge of the most 
fundamental concepts of fractions, decimals, or per- 
cents. Similarly, they could identify simple geometric 
figures, make simple measurements, and read simple 
graphs, but they could not use basic properties of 
geometric figures, compute areas or volumes, or draw 
conclusions from graphs and tables. They lacked die 
ability to apply what diey knew to a problem solving 
situation. At a time when mathematical skills are in 
high demand in the work place, few students in the last 
years of secondary scF jol have mastered the fundamen- 
tals needed to perform more advanced mathematical 
operations. 
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Figure 1. National trends in mathematical proficiency. 

103 



Learning Concepts and Skills 

One of the antral issues driving recent reforms in 
the U.S. mathematics curriculum has been the relative 
emphasis that should be placed on developing under* 
standing of basic concepts and the teaching of mathe- 
matical skills. The modem mathematics movement in 
the United States in the 1960's emphasized under- 
standing, whereas the**back-to-basics" movement which 
followed in the 1970's focused on teaching skills. It is 
not a question, however, of choosing between under- 
standing and skills. There is mounting evidence that 
students cannot learn skills effectively in isolation but 
must understand the skills they are learning if they are 
going to retain them and be able to apply them in 
unfeiriliar contexts. 



Table 4 

Performance on Basic Number Concept Items 



Percent Correct 
item Grade 3 Grade 7 Grade 1 1 



A. What is 100 more than 

498.' 37 64 

B. 5 1/4 is the >ame as 47 44 
5+1/4 

5- 1/4 
5x 1/4 
5+1/4 

C. Write .037 as a fraction 48 58 



The results of the fourth NAEP mathematics as- 
sessment suggest that many students are failing to 
develop an unders ending of important concepts un- 
derlyi ng the skil Is they are attempting to learn. The dif- 
ficulty that third-grade studen ts encoun ter ad ding th ree- 
digit whole numbers can be traced to their lack of 
understanding of place value concepts for three-digit 
numbers and older students' difficulties with fractions, 
decimals, and percenr*- reflect serious gaps in their 
knowledge of basic fraction, decimal, and percent 
concepts. 

Performance on the items in Table 4 illustrates 
how limited many students' knowledge is of the basic 
meanings of fractions and decimals. Many students 
who were successful at routine, frequently encountered 
calculations had difficulty when they were asked ques' 
tions that did not involve standard calculations pre- 



Table 5 

Generalizing the Formula for the Area of a Rectangle 



Percent Correct 
item Grade 7 Grade 1 1 



A. What is the area of this n-ctangle? 46 70 



6 



5 










B. What^.:.c 


area of this square? 


13 


45 


12 











Table 6 
Performance on Algebra Items 



Percent Correct 

Item Algebra 1 Algebra 2 

A. Solve: 83 91 

6x +5 = 4x +7 



B. Simplifyr: 74 87 

9(1 +5x) + 13 



C. x-y>x + y implies 38 50 

y < 0 

X > 0 

X = 0 



X = y 
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scntcd in a familiar context, even when the questions 
involved basic number concepts. 

The items in Table 5 illustrate the difficulty that 
students had generalizing the procedures that they had 
learned. Almostone-half of the seventh-grade students 
could calculate the area of rectangle, but only 13 
percent of them could apply this knowledge to find the 
area of a square, even diough almost all students knew 
that the sides of a square are equr.l. 

The items in Table 6 offer another example of the 
procedural orientation of many students. The majority 
of eleventh grade students who hzd completed one or 
two years of algebra could perform the symbolic ma- 
nipulations involved ir solving equations or simplify- 
ing expressions. Very few of them, however, could 
identify the relationships between variables that were 
implied by the equation in the third item. 

Problem Solving 

Although students at all three age levels could 
solve simple one-step word problems, they experienced 
difficulty with any nonroutine problem that could not 
be solved by the simple application of a familiar proce- 
dure. The results summarized in Table 7 illustrate the 
difficulty that students had with problems involving 
several steps. Most third grade studentscould solve the 
one-step problem in Item A. In fact, performance on 
this item was comparable to performance on similar 
computation items. There was a significant drop in 
perform., nee on the two-step problem in Item B in spite 
of the fact that 85 percent of the third grade students 
could perform the additional computation required for 
this problem. 



Tabic 7 

Performance on ProWem Solving Items 



Percetit Correct 

Item Grade 3 Grade 7 



A. Robert spends 94 cents. How much 68 — 
change should he get back from $1.00.' 

B. Chris buys a pencil for 35^ and a soda 29 77 
for 59^. How much change does she get 

back from $1.00.' 



Table 8 
Applications of Skills 



Percent Correct 
Ifem Grade 11 



A. Here are the ages of six children: 72 

13, 10, 8, 5. 3, 3 
What IS the average age of these children? 

B. Edith has an average (mean) score of 80 on 24 
five tests. What score does she need on 

the next test to raise her average to 81 .' 



Table 9 

Performance on the Same Items With and Without 
Calculators 
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The pair of problems in Table 8 contrast 
eleventh'grade students' performance between a 
problem in which a standard procedure is sufficient 
and one in which understanding of the concept of 
average is needed. 

For each of the previous N AEP mathematics 
assessments, performance on items assessing students* 
problem-soK'ing abilities has provided the greatest 
cause for concern in the United States. This 
concern continues with the poor performance on 
multi-step problems and on nonroutine problems in 
the fourth assessment. 

Instructional Indicators 

In addition to administering items that measure 
studentachievementinspecificareasof the mathemat- 
ics curriculum, N AEP gathers data that might be more 
generally considered indicators of instructional activ- 
ity in school mathematics. 



Percentage of Items Correct 
Grade Number of Items W»th Calculator Without Calculator 



3 


11 


69 


51 


7 


30 


61 


48 


11 


32 


75 


67 



Mathcnatics Course Enrollment 

At the time of the fourth NAEP mathematics 
assessment, over three-fourths of the 17-year old stu- 
dents in the sample reported that they were currently 
enrolled in a mathematics course. Moreover, the gen- 
eral trend appears to be toward taking more advanced 
mathematics courses. After previous declines, reported 
enrollments in Algebra II and more advanced mathe- 
matics courses (e.g., pre calculus and calculus) in- 
creased between 1982 and 1986. Despite the increase, 
however^ the data indicate that o/er 50 percent of 17- 
year olds had not enrolled in Algebra II and almost 40 
percent of th is age group reported not having taken any 
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mathematics course beyond Algebra I. In recent years, 
many^ states have increased the amount of mathematics 
required for graduation from secondary school. The 
proportion of students taking the more advanced mathe- 
matics courses, however, remains considerably less than 
10 percent. 

Classroom Instructional Activities 

The fourth NAEP mathematics assessment in- 
cluded a variety of student background questions about 
the types of instruction in mathematics classes. For 
students at all three grade levels, typical mathematics 
instruction apparently consists of listening to teacher 
explanations, watching a teacher work problems at the 
board, using a mathematics textbook, and working 
proble ms presented on worksheets. About two-thirdsof 
the seventh-grade students and over one-half of the 
third -grade and eleventh -grade students reported never 
working in small groups to solve mathematics prob- 
lems. Although students reported a strong likelihood of 
working alone in mathematics class, approximately 80 
percent of the students at all three grade levels reported 
that they never woik on independent projects or labo- 
ratory activincsin mathematics class. In general, these 
recent data suggest that little has changed in U.S. 
school mathematics instruction over the past decade. 

Technology 

The rapid growth in the general U S. culture of 
available technological tools suggests that a parallel 
growth would have been seen in the Nation's schools. 
Some data from the fourth NAEP mathematics assess- 
ment provide glimpses of the extent to which technol- 
ogy has had an inr.pact on mathematics instruction in 
the United States. 

Almost all students at the three grade levels re- 
ported having access to a calculator at home. But 
relatively few reported having a calculator made avail- 
able for use in school in mathematics class. In fact 
about two-thirds of the seventh-grade students and 
approximately one-half of the third-grade and elev- 
enth-grade students reported never having used a cal- 
culator (even their own) in mathematics class ^X^en 
they were used in mathematics class, calculators were 
reportedly used most frequently to check answers These 
data suggest that schools are lagging far behind the rest 
of American society in making available calculation 
tools and utilizing their potential for instruction and 
learning. 

In addition to asking students about their use of the 
calculator for various tasks, a common set of problems 
was given to two equivalent samples of students, one 
sample using calculators and the other not. Students 
using calculators consistendy performed better than 
students without calculators at all three gra-iC levels. 
The difference, however, diminished with age. Their 



relative performances are given in Table 9. 

Although students did better on the straightfor- 
ward computation items given in the calculator as- 
sessment when calculators were available, overall 
performance on items for which calculators were 
av'ailable declined significandy for the two younger 
groups across the past three assessments. 

The data on computer use and impact are some- 
what more encouraging. Nearly one-halfof the 13-year 
olds and more than c»'e-half of the 17-year olds re- 
ported having access to computers to learn mathemat- 
ics. This represents a major increase over previous 
assessments. It is not clear from the data how often 
students have access to computers or for what purpose. 
Nevertheless, it isclear that computers are increasingly 
available in U.S. schools for use in mathematics and 
that they are being used, at least someti mes, to enhance 
students' mathematical problem-solving activities 



Conclusion 

Following the broad declines in student achieve- 
ment that characterized the 1970's, it appears that 
there has been an upturn in achievement in mathemat- 
ics in the United States in the 1980's. This trend 
provides litde cause for complacency, however, as most 
of the progress occurred in the domain of lower-order 
skills. Student achievement at all age levels showed 
serious deficiencies. The discrepancy between students' 
desi red and actual level of mathematics proficiency be- 
gins early on in schooling, and increases as they move 
into the upper grades. For minority students whose 
mathematics performance has tended to lie below na- 
tional averages in NAEP assessments, the discrepancy 
between expected and actual performance for all age 
groups remains even larger than that for the nation as 
a whole, despite considerable gains since the last assess- 
ment. 

The indicarionsof a general increase in participa- 
tion in advanced mathemarics coursework is cause for 
hope for increased mathematical proficiency in the 
future. However, the emphasis on computational skills 
that generally characterizes school mathematics in the 
United States has left many students with serious gaps 
in their knowledge of basic underlying concepts. These 
deficiencies prevent students both from flexibly apply- 
ing their knowledge and skills and from learning more 
advanced knowledge and procedures. Moreover, many 
of the skills that they have learned are in danger of 
becoming obsolete as technological advances alter the 
mathematics that adults need to function productively 
in society. 

The curriculum reforms proposed by the National 
Council of Teachers of Mathematics (1987) in their 
recenriy released Curriculum and Evaluation Standards 
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f(n Schools Mathematics (draft) call for a reorientation 
of the school mathematics curriculum to place greater 
emphasis on helpingstudents to become mathematical 
problem solversand tocommunicateand reason mathe- 
matically.The resultsof thefourth N AEP mathematics 
assessment indicate that these are areas most critically 
in need of reform. Narrowing the gap between the 
current state of student achievement and classroom 
instruction and what should be constitutes a major 
challenge for American education. 
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THE GRADED ASSESSMENT IN MATHEMATICS PROJECT 



Margaret Brown 



The Graded Assessment in Mathematics project, 
which rorms a classroom- based scheme of continuous 
and progressive assessment for students aged 1 1-16, is 
described in this paper. Successes and problems are 
briefly catlined, along with plans for future develop- 
ment. 

- Aims 

TheGraded Assessment in Mathematics (G AIM) 
project has as its aim the production of a continuous 
assessment scheme which will tccoid the mathemati- 
cal progress between the ages of 1 1 and 1 6 of students 
across the whole range of attainment. The intention 
is to takea I ^adly construed viststance and thusfocus 
on what ti.e studenl understands and can use at a 
particular point in time rather than on what has been 
taught. The scheme has therefore bet.i designed to be 
implemented alongside a variety of teaching schemes 
and teaching organise ^ons, although an assumption 
has been made that the mathematics curriculum fol- 
lows the broad principles laid down in the Cockcroft 
Report (Department of Educational Science 
;DES),1982). 

At a more detailed level, the aims of the Graded 
Assessment in Mathematics project include: 

1 . To provide an explicit, continuously 
updated record of the mathematics that 
students know, understand, and can apply, in 
order to: 

- iielp teachers better match the 
curriculum to the student; 

' help students become aware of their 
progress, and more actively involved in 
their own learning; 

- provide informac'on for parents, heads, 
colleges, employers, in whatever degree 
of detail is desired. 

2. To encourage a curriculum which conforms 
to the recommendations of the Cockcroft 
Pveport and in particular includes. 

- investigations and practical problem- 
solving; 

- discussion, group work, and extended 
work; 

- a focus on process as well as content; 

- emphasis on undersr^* iding and applying 
concepts, rather tlian on knowledge of 
specific techniques; 

Q 'a broad range of machematical ideas. 

ERIC 



In addition to these educational aims, in order to 
be attractive to teachers it was necessary to have a 
third, more pragmatic aim: 

3. To link into other assessment and curriculum 
schemes by the provision of: 

- the facility to convert the continuous 
record into a summative gittde which is 
accepted as valid without any supplemen- 
tary final examination in the General 
Certificate of Secondary tAiUcation 
(GCSE), the new n^itional examination at 
age 16+; 

- graded assessment profile certificates from 
one of the national examination groups 
which will feed in to the Record of 
Achievement schemes at present being 
trialled as a feature of government policy 
(DES 1984,1987a); 

- specific guidance for teachers who wish to 
integrate the GAIM assessment scheme 
with the most popular published ''curricu- 
lum schemes; 

- a close match with the structure of the 
proposed national curriculum (DES 1987, 
1988a,b) which will enable GAIM to be 
used as the teacher assessed element of the 
national asessment at 7, 11, 14, and 16. 

Background 

The development of "graded tests" which more 
recendy have evolved into "graded assessments" 
constitutes a significant innovation in classroom as- 
sessment. It has taken place in England during the last 
decade, starting in the mid-nineteen-seventics in the 
area of modem languages (Harrison, 1982; Pennycuik 
&Mi.tT>hy,1988). 

Teachers of modem languages in England had ex- 
perienced a problem of student drop-out which be- 
came particularly severe after the change to all-ability 
comprehensive schools. Feeling a need both to in- 
crease motivation and toprovide evidence of achieve- 
ment for students who did not continue long enough 
to sit the public examinations at 16+, a number of 
local teacher groups developed systems of graded tests, 
following the model of exam. nations for professional 
interpreters. The characteristics of graded tests wc-e 
defined as: 

- progressive, with short-term objectives 
leading on from one to the next; 
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" task-oriented, relating to the use of 
language for practical purposes; 

- closely linked into the learning process, with 
pupils or students taking the tests when they 
are ready to pass. (Harrison, 1982) 

In fact, most of the graded test systems shared two 
other common features. First, they were generally 
organised into a set of successive levels, designed in 
most cases so that the median student might expect to 
pass level 1 at the end of year 1 in the secondary scliool, 
level 2 at the end of the second year, and s^o on. 
Second, they were not only task-oriented, but also in- 
corporated at each level a set of objectives (grade 
criteria). Thus most graded test systems were crite- 
rion-referenced, with criteria which tended to the 
active and the oractical. 

The first evaluation of a graded test system was 
extremely positive in terms of attitudes of students, 
teachers and parents ( Buckby e t al. , 1 98 1 ) . Rises of the 
order of 20 percent in the numbers of pupils continu- 
ing with the study of modem languages were reported 
fairly consistently across different schools. 

Not surprisingly, evidence of improvement in 
student motivation on this scale attracted attention 
among officers in local education authorities, two of 
which were to form consortia wi th examination boards 
in order to develop graded test systems in the major 
subjects. 

The GAIM project is the mathematics s^^heme 
which forms part of the London consortium, in which 
the partners are the Inner London Education Author- 
ity, King's College London, and the London East 
Anglia Group for GCSE (which includes the Univer- 
sity of London School Examinations Board). Within 
the consortium, parallel graded assessment schemes 
are under development in science, modem and com- 
munity languages, English, and craft,design and tech- 
nology. The GAIM project also receives generous 
funding from the Nuffield Foundation. 

By 1983, approaches to assessment had become 
much broader, incorporating, for instance, the possi- 
bility of observation by teachers duringclasswork and 
assessment negotiated between pupil and teacher. 
The term "graded tests" was therefore superseded 
nationally by "graded assessment" so as to allow such 
non-test methods where appropriate. 

Research Basis 

The notion of levels of attainment in mathemat- 
ics, which provides the basis for a graded assessment 
scheme, was the subject of a lai^e-scale investigation 
of secondary students* understanding of mathematics. 



undertaken at King's College London (previously 
Chelsea Cbllege) in the nincteen-sevenries (Hart, 
1981; Hart, Brown & Kuchemann, 1985). Principal 
findingsof thisstudy, "Concepts in Secondary Mathe- 
matics and Science"(CSMS), were: 

• in spite of the fact that students had often 
followed a similar curriculum, the range of 
attainment across any single age group was 
very large; 

. progress from one year to another was rela- 
tively slow» particularly so in relation to the 
attainmenr range in any single age-group; 

• many s^udents were using only radier primitive 
madiematics, much of which had been taugtt 
in the infant school (ages 5-7); 

• students rarely made use of mediods taught at 
school, but preferred their own idiosyncratic 
methods, which were often specific to a par- 
ticular problem and not generally applicable; 

• within each topic (such as ratio, graphs, etc.) a 
series of from 4 to 7 levels of attainment could 
be differentiated, and students appeared to 
progress dirough these levels in a consistent 
way (i.e. even widi 7 levels, not more than 7 
percent of students appeared to have achieved 
one level without achieving all die levels 
below it). 

The work of the (ISMS study ^ ,s identified the 
low correlation between what Ac Second Intema- 
tional Madiematics Study terms "the implemented 
curriculum" and "dieattained curriculum". This h igh- 
lights the need for an accurate record of what each 
individual student knows, understands, and can apply, 
in order to assist teachers in reducing die degree of 
curriculum mismatch. 

The CSMS study also provides considerable data 
as to which matfiematical concepts and skills might be 
included at each level of a graded assessment scheme, 
once tfie definition of tne levels is agreed. Other 
information used to assist in this task derived from the 
results of other research projects at King's Cbllege 
(Dickson, Brown and Gibson, 1984; Booth, 1984; 
Hart, 1984; Kerslake, 1986; Denvirand Brown, 1986; 
Hart et al, in press) . Further survey data was available 
from the Assessment of Performance Unit (DES, 
1986) and from the examination boards. 



GAIM Structure: Levels 

It was necessary near the start of the GAIM 
project to determine the number of levels to be in- 
cluded in the scheme. The precedent of about one 
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level per year which had been set by the modem 
languages graded test schemes seemed to have proved 
satisfactory; it was felt that the result of any fewer 
levels than this would be that students would become 
discouraged. The results of a govemment-fundcd 
evaluation study of a graded test scheme for low 
attainers in mathematics later confirmed this (Close 
& Brown, 1988). 

However, in contrast to die study of modem lan- 
guages, which normally begins in England at age 11, 
there was considerable evidence that students were 
already at many different levels in mathematics at the 
start of secondary school. TheCockcroftRepcrt (DES, 
1982), drawingpardy on CSMS data, suggests a range 
of seven years of achievement for students at 1 1. In 
fact closer scrutiny of the CSMS data suggests that the 
range is at least 10 years, since while low attaining 1 1 
year olds behave mathematically like 7 year olds, 
advanced 1 1 year olds are considerably ahead of aver- 
age 14 year olds. 

Hence it was decided that GAIM should have 15 
levels, allowing for the brightest students at 11 to 
make a furAer 5 years' progress. Having studied data 
of examination grades i n the previous national exami- 
nations at 16+, it seemed reasonable to identify the 
last seven GAIM levels with the seven grades of the 
GCSE since the median 16 year old previously ob- 
tained the grade which would be equivalent to level 
10. The positioning of the boundaries for the earlier 
GAIM levels was done by reference to CSMS data, 
gathered on each year grorp from 11-15, so as to 
maximise the chances of students progressing at the 
rate of one level per year. 



The resultoffixing the levels in this way is that the 
earlier levels are "closer together" in mathematical 
terms than the later levels, since the students on later 
levels are expected to make greater progress in each 
year. 

Although the earlierwork at King's College (Hart, 
1981; Denvir and Brown, 1986) does support the idea 
of hierarchies of leaming, with die order in which con- 
ceptually based skillsare developed relatively invariant 
within specific local branches, there is evidence that 
not all children progress uniformly across all mathe- 
matical areas. Also, rjie hierarchies may only hold for a 
particular education system at a particular time. 

For these reasons, GAIM discourages teachers from 
teaching and assessing one level at a time, a<5 was 
assumed in the earlier graded test model. Instead it is 
suggested that students* current records contain details 
of several adjacent levels, so that teachers may, on a 
particulartopic,assesshowhighstudentscango in their 
attainments, even if this is at higher levels than those 
at which students are generally working. 

The leaming theory assumed is thus a constructi v- 
ist one, in the simple if not in the radical sense (Kilpa- 
trick, 1987) in which it is expected that children 
gradually construct their own mathematical knowl- 
edge in relation to the experiences they have had- They 
should not therefore be assessed only on die parts of the 
curriculum they have recendy been presented with, as 
is the current custom. To give weight to this rccogni- 
fion of diversity in leaming pattems, the students will 
receive customised profile certificates recording the 
highest level reached in each separate topic. 





LEVEL 

1 2 3 4 5 6 7 8 9 10 11 12 13 M 15 


*Low-ortoiner* 
(90rh percentile) 


• • • • • 

Isr 2nd 3rd 4fh 5rh 

yeor 


'Averoge* student 
(50th percentile) 


• ^ • • • 

1st 2nd 3rd 4th 5rh 

yeor 


'High-ottoiner' 
(10th percentile) 


• • » • • 

1st 2nd 3rd 4th 5th 

yeor 
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Figure 1. The expected progress of students at the lOdi, 50di and 90th percentiles respectively. 
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GAIM Structure: Content and Process 



Assessment and Recording 
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The GAIM assessment scheme is bascu on two 
components: topic criteria and coursework activities. 

Topic criteria are a bank of profile statements, or 
objectives, which aim to describe the mathematics that 
students know, understand and can apply. They are 
organised into the 15 levels of difficulty described 
above, with about 20-35 criteria ateach level, and into 
6 topic areas: logic, measurement, number, space, 
statistics, algebra and functions. 

Within each topic area the criteria are arranged 
within topic strands developing through the levels, 
enabling the structure of the scheme to be readily 
appreciated by teachers and students. Insofar as is 
possible, topic criteria relate to conceptually based 
skills and processes and not to rehearsed techniques. 

Examples of topic criteria- 
Can give an answer to a general 
question, or decide when a conjec- 
ture is true, by producing and/or 
testing some specific examples. 
(Logic, level 3) 

Appreciates that multiplica. '^jn by 
a number less than 1 reduces. 
(Number, level 1 1 ) 

Coursework activities are open activities which 
are designed to encourage student decision-making. It 
was deemed necessary to incorporate such activities to 
stimulate the synthesis of skills, as without them the 
topic criteria might encourage fragmented teaching. 
Activities are suitable for a wide attainment range, 
allowing students to tackle the problem at their own 
level . They can be used wi th whole c lasses, wi th groups, 
or with individuals. There are two types of activity, 
investigations and practical problems, reflecting the 
pure and applied aspects of maJ-hematics respectively. 

Examples of investigations and practical problems: 

Investigating the different symme- 
try patterns obtained by shading 
squares on a grid. 

Investigating the number of ways of 
giving change for different sums of 
money. 

Planning the layout of newspaper 
advcitisements. 

Scheduling the manufacture of 
garments in a small co-operative. 



At one level, the GAIM materials (GAIM team, 
1988) can be used in any way a school wishes. For 
example, teachers may use the topic criteria to help 
write dieir own school profiling scheme, or the course- 
work acti vities alone tohelpassess theattainmentlevel 
ofthestudents. However,if schools wanttheirstudcnts 
to receive certificates recording thei r performance from 
the examination group, and to use the scheme as an 
alternative way of gaining grades in the national GCSE 
examination at 16+, the school must be accredited by 
the examination group, and visited regularly by an 
external assessor to check thattheschool iscarryingout 
assessment procedures properly , and is using equivalent 
standards to other schools. 

(Doursework activitiesare assessed according to the 
overall level of the work. Because of the degree of 
openness of the activity It is not possible to give precise 
instructions for marking; nevertheless, guidelines are 
provided for each task as to the sort of performance 
expected for each of the levels, together with diverse 
examples of students' work. In practice, after a short 
training session, teachers find these easy to use and a 
satisfactory degree of agreement isachieved. The teach- 
ers* notes receiveextensivetrialling before publication. 
It is intended diat teachers experienced in using the 
scheme should be encouraged to use activities from 
other sources, a nd work is progressing on a more general 
set of guidelines for diis. 

Topic criteria should ideally be assessed as a result 
of the student's performance in open activities, since 
the fact that students cai . apply their knowledge in such 
situations provides reliable evidence for its acquisition. 
As part of th' teacher's notes for each activity are listed 
those topic criteria that trialling has shown are most 
likely to be demonstrated. 

In practice only a minority of criteria are assessed 
in this way, and the remainder have to be assessed 
during normal classwork. Because there is a wide range 
across different schools in teaching organisation and 
teach ing material, it is not realistic to prescribe exacdy 
how assessment of each criterion should be carried out. 
Hence in the end it must be left to the external assessor 
to check that procedures are appropriate. 

To assist assessors, certain safeguards are built in. 
For example, written evidence must exist for at least 50 
percent of the criteria for any student. A single written 
item alone however is not considered sufficient, since 
students are expected to be able to apply knowledge in 
different contexts. For the same reason, teachers are 
asked to ensure a delay of at least two weeks, and 
preferably more, between any direct teaching related to 
a criterion and student assessment. 
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This still gives teachers the opportunity to assess 
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some items orally and/or practically, without the re- 
quirement of written work by the student. Sets of 
criteria written so as to be comprehensible to students 
arc provided to encourage students to record their own 
achievements and to volunteer to the teacher when 
they feel they have attained a criterion. The teacher 
however still needs to demand and judge the support- 
ing evidence. 

Although no formal evaluation of G AIM has yet 
been carried out, many teachers have reported in- 
creases in student motivation as a result of students 
more actively participating in their own assessment and 
being aware of short term goals which are perceived to 
be realisable. Teachers have also reported that they 
have gained much more insight into what students 
know and what they find difficult asa result of focussing 
on individual achievement in a detailed way, whereas 
previously their assessment related closely to their own 
teaching. Against this, teachers have found it to be a 
radical step which is quite onerous, particularly to begin 
with, in terms of workload. 

The Development PrOtjfamme 

The GAIM project has been evolving over five 
years and has one more year of the development phase 
to go before the running of it is taken over entirely by 
the London East Anglia Group of examination boards. 
During that time hundreds of teachers have contrib- 
uted to its development, some generously seconded 
full-time for one or two years by the Inner London Edu- 
cation Authority or other authorities, many with regu- 
lar half day releases to attend feedback and develop- 
ment meetings, and others corresponding with us or at- 
tending occasional weekend or one-day conferences. 
Over seventy pilot schools, organised in clusters in 
twelve local authorities, will be working with the proj- 
ect from September 1988. This will be about 20 more 
schools than are currently involved. 

A development package is now published with 
material for ^he first eight levels (GAIM team, 1988); 
the complete publication is planned for 1990. We arc 
still working on new support materials to assist teachers 
in running the scheme, and will be closely monitoring 
the first awarding of GCSE certificates in 1989. 

A major need will be to modify the scheme so that 
it will provide one form of the teacher-assessed compo- 
nent of the national assessment, due to begin operation 
for 14 year olds in 1991. Ttiis should be a reasonably 
simple task, sinceGAlM,alongwith the London-based 
graded assessment schemes in other subjects, is ac- 
knowledged to have been chosen as the model adopted, 
with some regrettable modifications to fit with govern- 
ment policy, for the national assessment scheme. 
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AN INFORMAL DIAGNOSTIC INSTRUMENT FOR ALGEBRA: 
RATIO AND PROPORTION 

Douglas Edge 



Collection of information'rich data is extremely 
important to diagnosticians and researchers. In mathe* 
matics education this is typically accomplished by 
asking students to complete either a pencil^and^paper 
test or by speaking with students using think^aloud or 
structured interview techniques. Each approach hasits 
strengths and weaknesses. Pencil'and'paper tests per* 
mi t large scale data collection yet cannot provide much 
more than superficial error^pat tern analysis. Interview 
techniques facilitate collection of more detailed infor* 
mation, often focussing on conceptual development, 
but the time consuming nature of interviewing means 
that the number of students involved in such studies 
must remai n relatively small . The major purpose of this 
study was to investigate whether or not a diagnostically 
oriented test could be developed wJiich would incorpo- 
rate the advantages of both approaches to data colleC' 
tion. 

The topic chosen for this study was "concepts in 
early algebra" and focussed on students in Grades 7 and 
8; that is, students i n the two years i mmediately preced' 
ingen try into secondary school in Ontario. Thetestde* 
veloped consisted of twel ve questions: four to focus on 
students' notionsof equation, four on variable, and four 
on ratioand proportion. This paper reportson thequeS' 
tions relating to ratio and prqx)rtion. 

Background 

Writingin mathematics classes isnot new. Burton 
(1985) outlined different forms that such writing could 
take to promote the development of intellectual skills 
essential to the understanding of the discipline. Stem* 
pien and Borasi (1985), like Burton, focussed on writ' 
ing as a learning tool by asking students to write 
mathematics'related stories, essays, and diaries. They 
concluded that writing provided opportunities for stu* 
dents to clarify their understanding of concepts and 
helped students organize their ideas. 

Gordon (1988) investigated the use of writing 
trategies with students enrolled in developmental' 
studies algebra classes. He compared three classes 
where students had to write about their mathematics 



assignment with three other classes, which were either 
given extra exercises and or simply discussed previously 
assigned work. Although Gordon was cautious about 
attributinghisfindings to the differences in treatments, 
he acknowledged the value of the writing strategy. 
Brigh t ( 1 988), also working with college level students, 
studied story editing as a methodology for identifying 
conceptual understanding in geometry. He found the 
story editing helpful in that it revealed otherwise unde- 
tected misconceptions. 

Specific to testing, Ashlock (1987) pointed out 
that penciUand'paper tests can be used to examine 
both skills and concepts, but that it is much more 
difficult to design items for tests that permit us to infer 
students' understanding of a concept. He provided 
examples of three different types of pencil-and-paper 
items that would be suitable for diagnosing conceptual 
understanding. The items involved sentence comple* 
tion for ideas or rules, symbolization for statements 
made up of numerals and signs, and portrayals for 
drawings that model the concept in some way. Olson 
(1987) designed an algebra readiness assessment device 
that, like Ashlock*s, used diagrams, comparisons, and 
so on, but also asked students to explain their choices. 
Olson's test included items related to class inclusion, 
transitivity, concept of equation, and proportional 
reasoning. 

The major focus of A\is study was students' ability 
to explain their answers in written form. The specific 
insights into children's understanding of proportional 
reasoning were secondary to that focus. 

Method 

Ninety'seven students ranging in age from 1 2 years 

4 months to 14 years 7 months in Grade 7 and 13 years 

5 months to 16 years 7 months in Graiie 8 participated 
in the study (see Table 1). This group represented all 
the students enrolled in the Grade 7 and 8 classes in an 
elementary school located in a town in rural south' 
western Ontario. The testing occurred duringjune, the 
last month of the school year. 
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Table 1 

Number, mean age in months (with standard deviation) and range of ages in months 



Grade 7 Grade 8 



Male 



Female 



Total 



Number 


23 


24 


Mean age 


157.1 (6.3) 


170(7.7) 


Range 


148-175 


161-199 


Number 


30 


20 


Mean age 


155.8 (4.2) 


166.9 (3.0) 


Range 


150-167 


161-173 


Number 


53 


44 


Mean age 


156.4 (5.7) 


168.6 (6.2) 


Range 


148-175 


161-199 



Prior to administering the test, it was explained to 
the students that we were trying to find out why some 
students find mathematics easier to learn than others. 
"Unfortunately people don't come with zippers in the 
heads (an audible "yuck" was heard). So, we have 
designed a special test, different from those you usually 
see. Let's try a sample question together." Students 
were then asked to discuss which fraction was larger and 
why, 1/5 or 1/6 , and practice writing down their expla- 
nation. A few sample explanations were discussed. Fi- 
nally students were asked to complete the test. *Try 
your best to help us. There is no ti me li mi t." No studen t 
took longer than 45 minutes for the twelve- item test. 



Questionnaire 

The last four items on the test were the ones related 
to proportional reasoning. Reference to these items 
will be done using numbers nine through twelve, the 
item numbers on the original test. See Figure 1 for the 
four item!). Note that on the actual script the items were 
placed one question per half page. After the question 
was presented, a four-centimetre work space was given 
followed by the statement, "Please explain how you got 
your answer." 



Item 9: If 8 tickets cost $12.00, how much would we have to pay for 6 tickets? 
Please explain how you got your answer. 



Item 10: Chris was asked to trim the trees in the gardens of three families. The Adams' garden had 10 trces- 
the Brown's had 15 trees; and the Campbell's had 25 trees. It took Chris 2 hours to trim the trees in the Adams' 
garden. How long would it take to trim those in the Brown's garden? How long would it take to do those in the 
Campbell's garden? 

Please explain how you got your answers. 



Item 1 1 : We measured the heights of two rectangles with sticks. The height of the short one was 4 sticks. The 
tall one was 6 sticks. We measured the height of the short rectangle again, this time with loops. The short one 
was 6 loops. How many loops would we need for the height of the tall one? 
Please explain how you got your answers. 



Item 12: One flagpole, which is 8 metres high, casts a shadow 3 metres long. Another flagpole casts a shadow 
5 metres long. How long is the second flagpole. 

Please explain how you got your answer. 
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Figure L ProportionaUreasoning items on the early-algebra assessment test. 



For Item 11, a pair of rectangles, proportionally 
drawn, approximately two and three centimetres in 
height, were sketched to the left of the question. For 
Item 12, a pair offlag poles, again proportionally drawn 
and about two and three centimetres in height, were 
displayed to die right of the question. 

The four proportional icasoning questions were 
not intended to represent a hierarchy of skills. Each 
question was selected because it either addressed a 
specific content objective or provided for some form of 
pedagogical insight. 

Item 9 was a modification of Hart's (1984) recipe 
question which gave ingredients suitable for 8 persons 
then asked what would be needed for 4 and for 6. For 
Item 9 it was decided to provide information about 8 
tickets but then direcdy ask for the information about 
6 tickets. Would children opt for doubling and halving 
strategies? Item 10 was prepared to parallel Hart's "eels 
being fed sprats" problem. This question was selected 
because the ratio of the numbers was 2:3 :5 and possibly 
would provide more variation in answers and explana- 
tions than had the ratio been easier, say 1:2:3. Item 1 1 
was an adaptation of Karplus* "Mr. Tall, Mr. Short" 
question that Hart used in her Concepts in Secondary 
Mathematics and Science (CSMS) (Hart, 1981) and 
Strategies and Errors in Secondary Mathematics (SESM ) 
(Hart, 1984) studies. Rectangles were used for this item 
because they could be drawn with the same width. 
Unlike the Mr. Tdl/Mr. Short figures, they varied 
proportionally only in height. The numbers in the 
proportion 4:6 = 6:n were considered particularly good 
in that they could facilitate either additive or multipli- 
cative solution strategics. Item 12 was written to 
incorporate a ratio where one term was not a multiple 
of the other. Students could not simply use doubling 
and halving, or factor techniques to find the height of 
tlie second flagpole. Would "difficult numbers" result 
in different strategies? Would students be able to 
explain the fractional aspects of the problem? 

Analysis 



catcgorizedandsummarized. Coding, categorizing, and 
summarizing data in this manner facilitated further 
analysis in two ways: on a quesrion-by-qucsrion basis 
across all students, and on a student-by-student basis 
across each set of items on any one script. 

Results 

Item 9: The analysis of data with the code sum- 
mary for this item is presented in Table 2. Of those who 
answered this item correctly, 22 of 29 Grade 7s and 23 
of 26 Grade 8s chose some form of uni tary analysis. No 
student used the tradirional unitary analysis form of 
writing equivalent sentences one beneath the other. 
Most showed 1 2+8 = $ 1 .50 and $1.50x6 = $9.00. For 
explanarions, students typically stated that they had 
divided the 1 2 by 8, then mulriplied theanswer by 6. A 
variarion of this procedure was done by two students 
who divided the 12 by 8 to obtain the $1.50 but then 
mulriplied the $1.50 by 2, to equal $3, and then sub- 
tracted the $3 from die $12. A third student provided 
asexplanarion a kind of mathemarical two- step: "I just 
subtracted and did a little division." In anodier vari- 
ation of unitary analysis, three students used guess-and- 
check strategies indicating that they guessed at $1.(X) 
for one ticket Vhich would be $8.(X) total", so they 
"needed" an additional $0.50 per ticket. Theother two 
students in this category gave variations of die same 
answer: "I guessed $2 for each one which was$16 so the 
answer had to be $1.50"; and "I just guessed $1.50 for 
each ticket and it happened to be right." On die 
incorrecdy answered questions, the majority of stu- 
dents used the numbers in the problem but to perform 
inappropriate algorithms. Typical of these responses 
are $10 (I subtracted $2 from $12), $72 (I multiplied 
6xl2),and$2 (12+6-2 so thecostis$2). Thestudents' 
arithmetic explanations were clear. What is not clear 
is why they chose particular operations. The other 
major category of incorrect responses is "correct reason- 
ing wrong answer." In all six cases the error occurred 
when dividing the $ 1 2 by 8 (for example, 1 2 +8 = $ 1 .45 , 
and 6 X $1.45 =$9.70). 
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Each item was scored using a two-part code. 
Answers were coded as 0 (incorrect), 1 (correct), or 2 
(omitted). Explanations for each answer were also 
coded. These latter codes were developed as needed. 
For example, if on the first script scored, die student 
provided an explanation such as "I multiplied", that 
phrase would be assigned a 01 code. If on the second 
script, the explanation was"first I divided, then I multi- 
plied", that would be assigned a 02. If the explanation 
on the third was "I multiplied", no new code would be 
needed. Examples of codes for I tem 9 are: 1 -0 1 , correct 
answer followed by "divided 1 2 by 8 and got the $ 1 .50, 
then I multiplied by 6 to get the answer"; and 0-06, 
incorrect answ followed by"I multiplied 6 by 12." For 
' Item 9 a total of 18 explanation codes were needed; for 
Items 10, 1 1 and 12, the number of codes needed were 
19, 10, and 22, respectively. These codes were then 
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Table 2 

Analysis of data with code summary for Item 9 



Code 




Grade 




Status/Explanation 


7(n=53) 


8 (n=44) 


2 Omitted 




U \ 1 1 /o/ 




1 Correct 


unitary analysis stated 


29 (55%) 


26 (59%) 




23 


22 




answer a guess 


4 


1 




no explanation 


2 


2 




correct answer wrong 








reasoning 


0 


1 


0 Incorrect 


Correct reasoning wrong answer 


18 (34%) 


9 (20%) 




3 


3 




wrong operation 


13 


3 




partially correct 


2 


1 




incomplete 


0 


2 



Item 10: Analysis of data for this item is given in 
Table 3. Of the correct re^onses, 20 of 28 students in 
Grade 7 and 16 of 24 in Grade 8 reasoned that if it took 
2 hours to trim 10 trees then it would take 1 hour for 5, 
hence 3 and 5 hours, respectively, for the Brown's and 
the CampbelPs. Many in this group did not show any 
calculations. They wrote their answers and provided an 
appropriate explanarion. Five students changed the 
hours to minutes, then wrote that it took 120 minutes 
for 10 trees so one tree would take 12 minutes, and so 
on. One student indicated that, "if Chris cuts the trees 
at the same rate, then he wi 1 1 do one and a half ti mes the 
twohoursfor theBrown's trees." Two students focussed 
on multiples of five: "the numbers were multiples of 
five diat could be reduced to lowest terms" and "1 0 is 2, 
1 5 is3 , 20 is 4 and 25 is 5 ." For the incorrectly answered 



items, the most prevalent response is2 1/2 hours for the 
Brown's, 41/2 hours for the Campbell's. Student #26, 
for example, wrote "it's 2 hours for 10 trees so 30 
minutes is for 5, and that's why 15 is 21/2." In the 
inconect explanation category, "operation/units con- 
fusion", one student wrote "You subtract 2 hours from 
10 trees giving 8; so now you subtract 2 from 15 and 2 
from 25 to get the answers of 13 and 23." Another 
student indicated that it would take "2 hours + 5 extra 
trees for 7 hours for the Brown's and 2 + 5 +10 = 17 for 
the Campbell's." A sample re^nse in the "explana- 
tion unclear" category is "I know what I'm doing, but I 
just screwed up. Divide by 3, use percent and divide 
how many trees put into two hours and so on forget it 
just used vision or percent." 



Table 3 

Analysis of data with code summary for item 10 



Code 



2 Omitted 
1 Correct 



0 Incorrect 




Explanation Category 



found 5 trees per hour 
changed minutes to hours 
B=l 1/2 A,C=2 1/2 A 
multiples of 5's 

correct answer r.o explanation 
explanation unclear 

used 1/2 additively 
correct reasoning/ 
incorrect-answer 
operation/units confusion 
explanation unclear 
wrong answer/no explanation 
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Grade 

7(N=53) 8(N=44) 



7 (13%) 
28 (53%) 
20 

1 

1 

1 

2 

1 

18 (34%) 
10 

1 
2 
3 
2 



1 1 (25%) 
24 (55%) 
16 

4 

0 

1 

2 

1 

9 (20%) 
3 

3 
2 
1 
0 



Item 11: Refer to Table 4 for the data summary. 
The overwhelming response for this item was the 
incorrect category of "add 2". Students generally ex- 
plained that thedifference in sticks between theheights 
of the rectangles was two, so the difference in loops was 
also two. Hart (1981) labelled thesestuJents "adders" 
(p. 94). They tended to see an additive relationship in 
proportional thinking rather than the multiplicative 
one. In Hart's study about 32 percent of subjects were 
"adders". In this study the percent of adders was almost 
double diis amount. Of the students who answered this 
item correcdy, their explanations were clear and accu- 
rate: "Six loops and 4 sticks to the short rectangle would 
mean 1 1/2 loops to every stick which means 6 loops + 
6 halves of a loop = 9 loops." One Grade 8 student 
focussed on the relative size of the two rectangles. He 
wrote "the small one was 2/3 the size of the tall one, so 
1/2 of 9 is 6." Although clearly representing only a 
small percentage of the students, thisgroup appeared to 
have a good understanding of thecomparison-by-multi- 
plication aspect of proportional reasoning. Of the in- 
correct strategies, other than the**adders" discussed 
above, the next most common answer justification was 
pattern-based. For example, several students wrote "its 
apattem^andthenshoweda two-by twoarray with the 
numbers 6 and 8 placed beneath the numbers 4 and 6. 
Another student explained that "If there were two 
more the first time, then I add double the second time." 
A third example of a pattern -based response was given 
by a student who answered "4 loops because 4+6=10 
for the first, so 6 + 4 = 10 for the second." 



Item 12: Data analysis and summary information 
for this item are presented in Table 5. Comparatively 
speaking, a large percentage of students omitted diis 
item. It is difficult to know if this was because students 
foundtheitem difficult or they felt pressure to complete 
the test within a certain time limit. Of the two students 
who answered correcdy , both indicated that the length 
of one metre of shadow would be about 2 . 6 (8+3 ), so die 
answer would be about 13 metres (5 x 1.6). One other 
student whose response I scored as incorrect explained 
that "8+3 is almost 3, and 14+5 is also almost 3, so die 
answer is 14." This response, and several others like it, 
clearly show multiplicatively-oriented conceptual 
understanding of the notion of proportion. Other 
examples include, "You need to double it and add 
some", arxl "For every metre you have **0 .375 metres of 
a shadow, so you multiply 8 x 0.375." 

Most incorrect responses reflected some form of 
"additive" thirJcing. The most common was for stu- 
dents to explain that they had to subtract 3 from 8 for 
the first flag pole and then add this difference of 5 onto 
the height of the second flag pole. This form of 
reasoning accounted for 14 of the 22 additive error 
responses of the Grade 7s and 8 of die 14 Grade 8s. 
Other additive error explanations involved adding and 
subtracting other combinations of the numbers 3,5, 
and 8. 



Table 4 

Analysis of data with code summary Item 11 



Code 



Status/Explanation 



Grade 



7 (N=:53) 8 (N=44) 



2 
1 



Omitted 
Correct 



Incorrect 



(9%) 



(20%) 
(14%) 



each slick 1 1/2 loops 


3 


2 


half more per stick 


0 


1 


add 3 + 6 


1 


0 


small 2/3 of lall 


0 


1 


correct/no explanation 


0 


2 




44 (83%) 


29 


add 2 


30 


27 


pattern 4,6,6,8 


3 


1 


pattern +2, +4 


2 


0 


pattern 4 to 6, 6 to 4 


3 


0 


explanation unclear 


3 


1 


no explanation 


3 


0 
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Table 5 

Analysis of data with code summary for Item 12 



Code Status/Explanation q^^^^ 

7(N=53) 8(N=44) 



2 Omitted 
1 Correct 



13(24%) 17(39%) 



1 (2%) 1 (2%) 

one metre of shadow is 8+3 i i 

^^^^OTTOCi 39 (74%) 26 (59%) 

errors with multi,idcas 2 5 

errors with add.idcas 22 14 

estimate or measured 2 0 

wrong answer/no explanation 7 5 

explanation not clear 6 1 



Analysis by student 

Considering the data on a student-by-student basis 
permitsotherfonnsofan^lysis(SeeTable6). Apattem 
of CCCC means that this student answered all four 
items con-ecdy. A CCNN pattern indicates d^at d^e 
student answered the first two con-ecdy, but eidier 
answered incon-ectly or omitted the second two. 
Working with die original coding sheet and the expla- 
nation categories list, a researcher could select and 
study a student typical of any one of die pattern groups, 
or could compare, for example, responses to the first 
two items of die CCCN pattern group widi those of die 
CCNN group. I have chosen to focus on what appears 
to be three distinct groups of students: those with good 
proportional reasoning skills (CCCC, CCCN, and 
NCCC), diose widi limited skills (NNNN), and those 



perhaps in a transitional stage (CCNN, CNCN, CNNN, 
and NCNN). 

There was a small group of students who demon- 
stratedagood understanding of proportional reasoning. 
Of those in the CCCN category, diree omitted the last 
question. The odier diree with en-ors on this question 
used multiplicative strategies: one is the student previ- 
ously discussed who wrote that "8+3 and 14+5 were 
both almost 3, so the answer is 14" The second 
indicated 8+3 was 3 1/3; die diird divided 3 by 8, radier 
than 8 by 3. The one student in the NCCC category is 
one of the students who divided 12 by 8 incorrecdy, 
getting $1 .40 instead of $ 1 .50. Hence, in diis group of 
good proportional th inkers, no student used an additive 
strategy. Each student correctly or incorrectly consis- 
tendy applied multiplicative concepts. 



Table 6 

Frequency of correct and incorrect response patterns for Items 9 through 12. 



Pattern Grade 7 Grade 8 



CCCC 0 1 

CCCN* 2 4 

NCCC 1 0 

CCNN 1: 13 

CNCN 1 0 

CNNN 9 8 

NCNN 8 6 

NNNN 15 12 



*C: corrccily answered; N: incorrectly answered or omitted 
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Of the 27 students in the NNNN category, one 
omittedall four items on the instrument and five others 
omitted three of the four. The majority of errors made 
by thcoC students were either computational or re- 
flected application of previously discussed inappropri- 
ate strategics. 

The students in the third group appeared to be 
transitional in their understanding. Widi only one 
exception, students clustered around answering some 
combination of the first two items. These items did not 
require multiplicative strategics. Typical of this group 
is a student with an NCNN pattern, who for Item 9, 
correcriy divided 12 by 8, multiplied the $1 .50 answer 
by 2 but then multiplied that answer by 2 again to 
obtain $6.00 for his answer. For Item 10, he gave a 
correct halving/doubling reasoning strategy. 

Discussion 

From a review of the literature, it is clear that some 
authors advocate writing in prose in mathematics classes 
for both teaching and testing purposes. This study, 
which was designed ro examine students* ability to 
respond in writing to ratio and proporrion items on a 
conceptually oriented diagnostic test, would provide 
support for diat view. Students explained their work 
with varying degrees of success. Some wrote terse 
remarks such as "I guessed." Others described in words 
the arithmetic operations that they had justcompleted. 
A number of students, rather than showing their work, 
just wrote the answers to questions but did provide 
appropriate explanations. There were also many stu- 
dents who responded withwelUwritten sentences. Char- 
acteristic of these were the guessing-strategy answers 
like "First I guessed $1 .00 but that was not enough to 
make the $12.00, so I tried $1.50...." These were the 
most interesting in that they provided clear indicarion 
of what the students said they did. 

The writing technique (student written responses 
followed by coding, categorizing, and summarizing) 
deserves support as a viable research tool. A major 
concern with the writing technique, however, is that 
regardless of how well students described what they did, 
no student described why a particular strategy or algo- 
rithm was chosen. To explain the source of her answer 
of 72, a student wrote that she multiplied 6x 12. She 
did not explain why she multiplied. Certainly, this is a 
major limi tation to this form of data collection. But the 
writing technique docs provide more informarion than 
traditional pencil-and-paper or multiple-choice tests. 
With multiple-choice tests, the researcher must infer 
what method the student used. With the writing 
technique, it is generally not difficult to determine 
v'hdt •^he student is doing. Srill, think-aloud and 
intervic v approaches would be more beneficial in that 
stuc^^nts would nor likely beable toomit qucsrionsand 
r\\c rcseaicher could \>sk for clarification of answers 
Q where needed. Further, these latter approaches would 
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facilitate probing into why the student chose a particu- 
lar algorithm or method. 

Analysis of the results of this study does provide for 
interesting comparisons with those of Hart's (1981). 
Proportional reasoning has been the subject of much 
discussion. Among the chapters in the CSMS project 
report (Hart, 1981) was one devoted to ratio and 
proportion. In this chapter. Hart investigated aspects 
of ratios such as doubling and halving, finding rate per 
unit, and enlarging drawings in ratios of 2:1 and 5:3. 
She found that doubling and halving strategies were 
among the easiest for most children, enlarging non- 
rectilinear figures in the ratio 3:2 the most difficult In 
the follow-up SESM project. Hart (1984) studied chiN 
dren's strategics and errors found to be common to a 
large sampleof thescriptsfrom theCSMS project. This 
second project involved diagnosis, analysis, and teach- 
ing by the researchers. Although findings confirmed 
much information obtained from the CSMS project, 
interviews and teaching allowed mors in-depth prob- 
ing of some misconceptions. For example, it was found 
that there was no evidence children used a standard 
ratio and proportion algoridim. They tended to devise 
their own methods. Post, Behr, and Lesh (1988) also 
discussed proportionality and focussed on i ediods 
likely to be used by students. They noted that piv^por- 
tional reasoning included notions of comparison, cO' 
variation, and die ability to process several pieces of 
information. This is perhaps the most important type 
of formal reasoning that students acquire during adoles- 
cence. Through their research they found that unit rate 
and factor strategies were the most successfully utilized. 

In this study relatively few used the halving/dou* 
blingstrategy. When given information about 8 tickets 
and asked about 6, almost 2/3 of the students opted for 
a unitary analysis procedure. However, for Item 10 
involving the 2:3:5 ratio, more dian half of die correct 
responses were obtained by halving the 2 then multi' 
plying by rheappropriate multiple. Itappears that many 
students could use a halving/doubling strategy, but pre* 
fcired a unitary analysis approach. The choice not to 
use the "intermediary 4" question did seem to influence 
the method chosen by the students. 

Comparisons between the rectangles item and 
Hart's Mr. Tall/Mr. Short question show that "adders" 
made up a large portion of both populations. It is not 
clear that the use of the additive strategy resulted from 
the influence of the fraction or from some other notions 
inherent in thequcsrion. The concept of enlargement, 
or similarity, requires further study. With the last item 
on the test, it is difficult to assess the effect of the 3:5 
rario. A few students' exp^'^r^ations were clear and 
hence helpful but to examine the "fraction versus 
conceptual-understanding-of-proportionconctpts"this 
item would have to be matched with comparable items 
with ratios of3:6 or 5:10. 
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The above findings lend support to Hart's identi- 
fication of levels of understanding of proportional rea- 
soning where students progress from unitary analysis 
and fector/mulriple methods up ro more formal ratio 
approaches. Furdier support can also be provided in 
that, similar to Hart's research, no student in this study 
used the proportional statement of a:b = c:d. The 
teachers reported that their instruction emphasized 
informal ratio and proportion experiences, but that 
occasionally they did use this form. Hart found that less 
than 1 percent of her students used the proportional 
form. 

Is this writing technique a useful strategy? A 
qualified "Yes" is perhaps the fairest arwwer. As a 
strategy it cannot replace interview approaches but it 
does appear to offer some advantages over traditional 
pencil-and paper tests. In thisstudy, the technique did 
permit some insights into students' corxeptual under- 
standing of proportional reasoning. It illustrated how 
the test could have been used to place students into 
groups with good, transitional, or weak ratio and pro- 
portion abilities. It also showed how the test might be 
used to identify various levels of understanding propor- 
tional thinking. 

Areas for further research would include investi- 
gating vyhether or not tests could be designed diat 
would study just one aspect of ratio and proportion in 
some detail; for example, having eight to ten questions 
on just enlargements. These tests would utilize writing 
or explanation-oriented approaches. OthtT studies 
could focus on whether or not the wri ting tech nique in- 
fluences the students selection of strategy. Is memory 
decay affected? Might not the writing clarify some 
concepts? One other area of research that might be 
considered is related to the format of the test. Would 
mulriple-choice tests combined with requests for chil- 
dren to explain their answers result in different re- 
sponses from those obtained from the open response 
format? Would students respond differendy if their 
response was not one of the options? 
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ASSESSING PROBLEM SOLVING IN SMALL GROUPS 



Derek D. Foxman • Lynn S- Joffe 



The work described in this paper took place in the 
context of national monitoring surveys of 1 1 - and 7 5 - 
year old pupils carried out in 1987 in schools in Eng- 
land, Wales, and Northern Ireland. The surveys were 
conducted under the auspices of the Assessment of 
Performance Unit (APU) at the Department of Educa- 
tion & Science (DES). About 800 pupils participated 
in the problem solving at each age level in groups of 
three, l.iese pupils were sampled separately from the 
main samples of over 10,000 at each age level who took 
other assessments. Small-group problem solving was 
i ncl uded among the assessmen ts because of the growi ng 
educational interest in cooperatiVv learning and in 
order to gain more information on children's perform- 
ance in this area than in previous APU surveys (DES 
1985). It was also expected that the problem solving 
processes would be more "visible" if externalised in 
discussion. 

The development sought to devise a situation in 
which cooperative problem solving was likely to occur. 
A framework for assessment was constructed to enable 
the performance of groups of pupils to be rated by 
trained assessors on significant aspects of problem solv- 
ing. In addition the assessors had available a scheme to 
categ* nse the group's activities and wrote detailed 
observations on the progress of the groups in their 
attempts to find a solution to each problem. ' ^ in other 
APU work, teachers played a prominent part in the 
development work and, during the surveys, as assessors. 

Factors Facilitating Cooperative Learning 

To what extent is cooperative learning in groups a 
feature of classrooms in Britain? For some years chil- 
dren in a high proportion of British primary classrc is 
have been organised in groups of 4, 5 or 6 around ch^ 
tables in a room. The grouping is often by ability for 
mathemarics or reading, and may be mixed ability or 
friendship groups for other areas of the curriculum 
(Great Britain, 1 978). However, several research stud- 
ies have shown that there is a distincrion to be made 
between "grouping" and "groupworic" (Tann, 198/;. 
Rarely do the classroom groups actually engage in col- 
laborative work, norare they asked to do so. More often 
they work as individuals; and, although neighbouring 
children may engage in discussion, this is not necessar- 
ily task-oriented (Galton et al., 1980). 

In secondary schools groupingis nre, and groupwodc 
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even more so. However Cowie & Ruddock (1986), 
who have conducted a group work project in secondary 
schools, point out that the new 16+ examination in 
Britain, the General Certificate of Secondary Educa- 
tion (GCSE), is encouraging schools to provide more 
opportunities for cooperative learning. They found 
that nearly half of the new syl labuses make reference to 
groupwork in course aims and 20 percent in both aims 
and assessment objecrives. 

The p tential benefits to children a ^earning in 
groups rather than individually has been of consider- 
able interest to educarionalists for some time. This 
interest stems from various sources: the desire to 
improve students* morivation, to develop their social 
and personal skills, or the need to organise learning 
with scarce classroom resources such as microcompu- 
ters. Theorerically, the work of Piaget (1959) and 
Vigotsky (1978) suggests that interacrive situations 
should provide children with more opportuniries for 
progression in their learning and development than 
working individually. Whethe* this can be demon- 
strated empirically has been the subject of a number of 
research studies in the past decade. For the purpose of 
thisproje . it was important to know whatfactorsw e 
likely to facilitate cooperative working. 

Slavin (1983) concluded that there is improved 
achievement in cooperative learning, but only when 
the group as a whole is rewarded rather than its mem- 
bers for their individual contriburions. The type of task 
also influences the effectiveness of group situarions 
(Cotton & Cook, 1982). Other factors which could 
affect group processes and achievement are the ability, 
racial, and gender mix within the groups. Higher 
attaining students could hz expected to provide more 
explanarions and more correct solutions to problems 
and ma/ have more social ir\fluence within the group 
(Cohen, 1984). However, Webb (1982) reported in- 
consistent results from a nu-..ber of studies of groups 
with similarormixedattainment composition. Several 
researchers have noted differences in the behaviour of 
boys and girls within mixed groups (e.g. Lindow et al. 
1985). 

If cooperative learning doc. have more positive 
effects than individualised l<..aniing, to what factors can 
they be attributed? Piaget*s (1959) view was that inter- 
ac tion with a peer pushes a child in the p re-operational 
stage cowards considering more than one perspective 



ERIC 



on a situation and so into the more advanced concrete 
operational stage (Mugny and ^isc, 1978). Vigotsky 
(1978) considered that social interaction generally is a 
prime cause of intellectual development. Learning 
creates the "zone of proximal development" which is 
the distance between what children can do on their 
own and what they can do under adult guidance or in 
collaborationwithmorecapablepccrs (Vigotsky, 1978). 
Light and Glachan (1985) found that children per- 
formed better when working cooperatively on a goal 
directed problem-solving task (Tower of Hanoi") than 
when working individually , even when there was litde 
verbal interaction. However, a task which produced 
more discussion ("Mastermind") was even more effec- 
tive. They found that pairs of children at both 8 years 
and 12 years produced solutions in fewer moves than 
children working alone. Furthermore, groups vho 
discussed the problem or argued about its solution most 
were signifi candy more likely to produce d^eir solutions 
in fewer moves than those who argued least. Fletcher 
(1985) also found that groups were superior to indi- 
viduals working on a microcomputer problem task, and 
that verbalising was a fecilitatory factor. 

Barnes and Todd (1977) conducted a study of the 
talk of 13 year old boys and girls while d^ey were 
working on ta^ks set by their teacher. They reported 
that"thequality of discussion typically for exceeded the 
calibre of their contribution in class...". Barnes and 
Todd pointed out that talk in a classroom is usually 
managed by the teacher. In a group, without an adult 
present, it is the children who have to negotiate the 
course of the discussion with its episodes of silence and 
conflict and the need to encourage others rather thai, 
to dominate them. 

These research studies do not present conclusive 
findings about the factors which might facilitate coop- 
erative working in groups. The gender and ability mix 
and the extent of interactive talk within the group 
obviously needed to be considered. Another factor 
could be the gender of the assessor (Joffe and Foxman, 
1988). The size of the group was not particularly noted 
as influencing cooperation in any of these studies, but 
it was an important factor in the organisation of the 
survey and so needed to be investigated in the develop- 
ment work. 

Devnloping the Group Situation for the Surveys 

The tasks tried out during the development, with 
the help of teachers* groups, were problems which could 
be tackled in different ways and had possibilties for 
extension. They included "everyday" tasks which 
required planning, and more purely mathematical 
problemswhich gave opportunitiesforpupils to conjec- 
ture relationships and test out their conject -rcs. Tasks 
were sought which , ideally, could be attempted b> both 
age groups so that some comparisons might be possible 



between them. As previous research had suggested, the 
nature and quantity of the verbal interaction varied 
with the task. A lot of problems were tried and rejected: 
some produced a lot of animated talk, but litde mathe- 
matics; od^ers some mathematics but little discussion. 
Finally, four tasks were developed for both age groups 
with differences in detail between the two versions in 
each case. A fifth task was also developed for the 15 
year olds. The tasks were: 

NuTnbcr Chaim* Investigating the effectof apply- 
ing a transformaric-. rule to a number and then to the 
result of the transformarion and soon successively, thus 
forming a chain of numbers. The rule used resulted in 
chai ns ul timately going into one of two repeating loops. 
The substantive problem was to find out what kinds of 
numbers led to a particular loop. 

Filling trays* This was a version of the maxibox 
problem — finding the largest capacity of an open box 
or tray which results from cutting squares from the 
comers of a rectangular sheet of given size. 

Class Trip/Day Out. Planning a day out on a 
limited budget given a map of places to visit, times of 
trains, activities and their cost, and menus at cafes. 

Packaging. Designing a package to send three 
delicate glass spheres through the post. 

Total 87 (for 1 5 year olds only). Devising a win- 
ning strategy in a game for two players or teams. Each 
team selects a number from 1 to 7 alternately, and the 
choices of both sides are added toged^er. The first team 
to reach 87 is the winner. 

The Class Trip (Age 1 1 ) and Day Out (Age 15) 
problems were borrowed directly from topics used pre- 
viously in the 1-to-l APU practical surveys. 'Tiese 
topics were also used again in the group and individual 
test situations. The Number Chains and Packaging 
group topics werc also adapted for the 1987 l-to-l sur- 
veys, for comparison purposes, and a version of the 
Number Chains problem was adapted for a written test 
in the 1987 surveys. 

Presenring the Tasks to die Groups 

Itw^is necessary to familiarise pupils with the con- 
tent of the problem and what was required of them. 
Each session was divided into three phases. Phases I and 
IIIwereinteractive,while in Phase!! the pupils were on 
their own, no help was allowed. !n Phase ! an introduc- 
tory task was given which was related .o, or part of, the 
substantive problem and prompts could be given. The 
main purpose of Phase ! was to make sure that as many 
pupils as possible understood what they were asked to 
do ill Phase !!. !n Phase !! no interaction was allowed 
because it was found that, when it was, assessors became 



part of the group and it was then difficult to get the 
group going on theirown. Eye contact wasavoidedand, 
if pupils asked questions or asked for directions, they 
were given a neutral response: ^That's up to you to 
decide." If a group attempted to draw in the assessor, the 
technique used was to feign a lack of interest. But when 
the group decided they had gone as far as they fel t able 
to do, Phase III began in which clarification was asked 
for of what had not been clear to tt»e assessor, and a 
general account given by the pupils of what they had 
done, why they had doneit,andhc\v iheyfeltaboutthe 
session. A fairly flexible script was used by assessors for 
Phases I and III. 

During the session assessors wrote notes on what 
was happening and what was said in as much detail as 
they could manage. During the surveys a number of 
assessors exhibited great feats of concentration and the 
ability to record considerable detail (They completed 
their records after each session.)- Tape recorders were 
not used, except in Phase 1 1 1, because it was not possible 
to gauge their effect on the pupils before the surveys 
took place. Barnes and Todd ( 1977) in their research 
felt it was unreasonable to combine the effects of being 
tape recorded for the first time with that of working in 
groups for tlie first time. 

Tne sessions lasted anywhere from 10-15 minutes 
up to an hour and a half. Most of the shorter and the 
longer sessions involved the more mathematical topics 
which, potentially , could be sol vcd qui te quickly. Some 
groups who could not find a solution were extraordinar- 
ily persistent in following up a range of hypotheses. 

The size and composition of the groups 

Decisions about the size and composition of the 
groups to be used in the survey were taken after a good 
deal of piloting to determine what kind of groups 
seemed to work best together. Usually only one piece 
of apparatus (e.g. a calculator) was provided in a group 
so as to emphasise the common aim, but boys were not 
infrequendy observed to grab it. Girls could be at a 
particular disadvantage in such a situation, especially 
those from some ethnic minority backgrounds. Friend- 
ship groups were consideredbut it was found that, if one 
member was of a much higher attainment than the 
others, that person would be likely to dominate the 
group. Groups with more than 3 children tended to 
split into subgroupi; it was more difficult for them to 
organise themselves and use the available resources 
effectively. Groups of only 2 pupils provided less 
discussion than larger groups. For these reasons it was 
decided to use groups of three pupils of the same sex and 
approximately the same attainment. 

Assessing the Groups 

The assessment schedule was developed with the 



assistance of groupsof teachersexpcrienced in using po- 
roblem solving and investigative work in their class- 
rooms, and guided by the work of theorists and re- 
searchers. 

Theoretical perspectives on problem solving place 
stress either on the abilities needed for problem solving 
(Piaget, 1959; Krutetskii, 1976) or on the activities 
engaged in during the solution process (Polya, 1957; 
Schoenfeld, 1983). The APUisconcemedwith assess- 
ing performance and not with traits or capacities of the 
person and so ability models are of less interest in this 
context. Schoenfeld ( 1983) has given a detailed list of 
knowledge and behaviour necessary for what he be- 
lieves to beanadequate characterisation of mathemati- 
cal problem solving performance. The main categories 
are: Resources (e.g. facts, algorithmic procedures); 
Heuristics (e.g. drawing figures, introducing suitable 
notation); Control (e.g. planning, monitoring, deci- 
sion making); Belief Systems - (One's "madiematical 
world view", determinants of an individual's behav- 
iour). 

In 1986 a number of marking schemes used for 
assessing investigative work in schools were collected 
and reviewed by NFER researchers. Most of them had 
been produced by teachers who had many years of 
experience of this activity in their classrooms. They 
were concerned with carefully written up post hoc 
reportsof extended investigations and gave some useful 
indications of possible frameworks. The process objec- 
tives most common to the schemes were found to be 
largely compatible with Schoenfeld's ideas: Formulat- 
ing the problem (Control); Use of mathematical strate- 
gies ( Heuristics); Level of ma Aema tics used (Resources); 
Evaluation and interpretation of results (Belief Sys- 
tems). Because teachers are, in addition, interested in 
the way results are communicated and in an individ- 
ual's personal contribution if the report is by a group, 
relevant categories covering these areas were noted in 
the review. 

Normally the categories were marked on a scale of 
4 or 5 points. Each point of the scale might carry an 
extended but fairly abstract description . For example a 
category, '^Report asa communicarion", in one scheme 
describes a top-rated report as one which is: "Logically 
structured with suitable selection of whai to present. 
Full explanation of the problem, its development and 
conclusions. Well written andappropriately illustrated 
with examples, tables and diagrams." A bottom-rated 
report would be "An untidy collecrion of resul ts, badly 
organised, with litde or no explanarion." 

Such descriptions must be relarive to the normal 
standards of the material produced for assessment. 
Indeed, some teachers preferred to leave it there and 
simply stated: Marks 0 to 4 decided by experience of 
requisite standards. 
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Categories of performance do not constitute a 
model of problem solving and there was interest in the 
APU research in gathering data which might enable 
some general picture of the problem-solving process to 
be derived. Problem solving might be characterised as 
a cyclic activity which successively refines the direc- 
tion which is taken towards a solution: for example, by 
formulating the problem more precisely; using more 
efficient methods etc. More icalistically, it is likely to 
be untidy and opportunistic (Hayes-Roth and Hayes- 
Roth, 1979). Data from the sessions which might 
enable a generalist^d picture of the process to emerge 
would have to be derailed znd chronological. There 
was interest too in determining relationships be^veen 
various categories of performance and wi th background 
factors, such as the gender and ability of the groups. 

In order to achieve these purposes, three sets of 
data were collected on the performance of each of the 
groups. In each case it was the group which was rated 
or categorised, noc the individuals within it. It was 
made clear to the pupils at the beginning of a session 
thai they were to work as a team and come up with an 
agreed solution. The data sets were: rating scales, a 
summary of performance, and observations. 

Rating Scales. A number of scales were derived 
from the development work with teachers and guided 
by the theoretical and empirical work on problem- 
solving processes. There were eight main scales with 
sub-scales in most cases. They related to the areas of 
social interaction, problem solving skills, communica- 
tion, and attitudes. Each scale or sub-scale had four 
points: O(low), to3 (high).The scales were as follows: 

1 . Social Interaction. There was o^e sub-scale relating 
to the amount of cooperation and a set of categories 
defining the type of group. 

2. Awareness of Problem. This category related to a 
group's overall grasp of what needed to be done to 
solve the problem: i.e. tlieir overall strategy. Two sub- 
scales. 

3. Working on the Task. The tactics used in relation 
to methods and level of mathematical argument. Three 
sub -scales. 

4. Resolution of the Problem. This was an overall 
judgement of the group's performance, by the assessor 
on one sub-scale, and by the pupils of themselves on 
another, of tfie extent to which the problem had 
been satisfactorily resolved. 

5. Extension to Problem. Very few pupils suggested 
additional questions which arose out of what tfiey had 
done. Consequendy, this category was largely redun- 
dant. 

6. Communication within Group. There were sepa- 
rate sub-scales relating to oral, visual, and written 
means of communication. 

7. Communication with Assessor. Thi three sub- 
scales related to the Way in which a a group's report was 



presented in Phase III. 

8. Attitudes. The three sub-scales related to the ratings 
of the pupils' involvement, persistence, and enjoy- 
ment. 

Each point on each scale was described, in general 
terms for the age 1 1 survey. For the older pupils die 
descriptions of points oi i some of the scales were related 
more specifically to individual tasks. 

Clearly there could be changes in the way groups 
operated during a session, and assessors were instructed 
that, in such cases, it was the later rather than the 
earlier behaviour which should determine the rating 
given. Thus a group which began cooperatively but 
ultimately worked individually should be given a low 
rating for cooperation, while one which began in a 
fragmented way but finally "gelled" should be given a 
higher rating. Similarly, in relation to Awareness of 
Problem, a group which began with little idea how to 
deal with a problem, but ultimately developed a good 
strategy, should be given a high rating. Not all of the 
sub-scales were relevant to every problem, and assessors 
were instructed not to give a rating if they diought a 
scale was inapprq)riate. 

Summary of Performance. A second set of data 
was obtained from the assessors who were asked to sum- 
marise each group's performance under a number of 
headings. For example, the headings for Fillir.c Trays 
were: Methods for finding tlie capacity of the trays; 
Accuracy of the mediods used; Size of traysconstructed; 
Accuracy of construction; Hypothesesgeneratcd about 
the relationships between the dimensions of the trays. 
For Class Trip theheadings included Awareness of time 
in planning; Strategies and methods used; Awareness 
of cost; Recording. 

Under each heading were listed the main possibili- 
ties which had been noted during die development 
work. The categories required assessors to make either 
yes/no decisions (Did the group find the capacity of 
trays by measuring, by using a cube, by multiplying 
length by breadth by height, by using a calculator, etc.) 
or ratings (Were the measurements made very accu- 
rate, not very accurate, or inaccurate?). 

Observations of Group Activity. Observations 
were recorded by the individual assessors on A4 paper 
divided lengthwise into sections. One section was for 
the main observations. Other sections were reserved 
for comments on the group interaction, the processes 
being used by the group, and for recording the time at 
various points during the session. Assessors recorded in 
as much detail as diey could during a session and then 
made up their notes when it was completed. 
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The Surveys 

The assessors* task was to administer die assess- 
ments in the schools selected for the survey. Over a 
seven-day period in May 1987, for those participating 
in the age 1 1 survey, or November 1987 for those in the 
age 15 survey, they travelled to schools in England, or 
Wales, or Northern Ireland. At each school there were 
usually three groups of three pupils of the target age 
group and composition who were to be administered 
one problem each. 

The assessors were experienced teachers nomi- 
nated by their Local Education Authoriries (LEA*s) at 
theinvitarion of theNFER. A jobdescriprion was sent 
to each invited LEA which emphasised that the teach- 
ers nominated should have taught boys and girls of the 
target age group and should be aware of recent develop- 
ments in mathematics education. The most typical 
nominees for the age 1 1 survey were heads or deputy 
heads of primary schools or advisory teachers working 
across the LEA but with recent successful practice in 
the classroom . For the age 1 5 survey the nominees were 
heads of mathematics departments or advisory teach- 
ers. The locations of those invited were distributed as 
evenly as possible over the geographical area involved. 
There were 16 assessors altogether at each age level. 

In previous APU surveys the majority of nominees 
for the practical tests were men. Spender ( 1981 ) has 
illustrated that mathematics teachers may respond dif- 
ferendy to boys and girls, and so it was decided to 
control for any effects of gender of assessor in the 1987 
practical surveys. LEA's were therefore asked to nomi- 
nate an assessor of a specified gender. >X^ile an equi- 
table gender balance of assessors was achieved for the 
age 1 1 survey, there was a slight imbalance in favour of 
men for the older pupils. 

The main training provided wasa two-day residen- 
tial conference for each set of assessors held a few weeks 
before the respective surveys in May and November. 
Assessors were also expected to practice administering 
the assessments in their own schools between their 
briefing and the actual survey. 

At the residential briefing the assessors were given 
the topic scripts and shown videotapes of groups work- 
ing on the survey problems. There were sessions in 
which the teachers practised recording observations in 
detail, both from videotapes and with children from 
local schools. At the briefing for the secondary survey, 
groups of assessors also simulated the assessment situ- 
ation: a technique which had been used successfully for 
several years at the briefing of the assessors of the APU 
1-to-l pracrical tests. 

Some rime was spent in discussing the nature of 
performance at different points on the raring scales. 

O 
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However, it was clear that a good deal more time was 
required than was available for the assessors both to 
observe and to reflect upon the wide range of ways in 
which pupils tackled the problems that had been re- 
vealed during the development work. Consequendy, 
the way assessors interpreted the scales will be exam- 
ined in the analysis. 

Th " ^n of die Survey 

The number of schoolspsrticiparing was 100 in the 
primary survey, and 80 in the secondary. The schools 
were selected randomly in a strarified sample. Three 
pupils in each sample school were then selected ran- 
domly from among those in the target age range. Each 
of these pupils was assigned to one of the groups to be 
assessed. The final selection stage of making up the 
groups of three members was left to the school. The 
instructions from the NFER were for schools to choose 
two further pupils for each group, of the same sex and 
similar attainment to the pupils already selected ran- 
domly. While only one instance occurred where a 
school was unable to match the gender of a randomly 
selected pupil, there were a few cases where a very close 
attainment natch was not possible. 

There were two checks on the attainment mix 
within groups: schools were asked to give estimates of 
survey pupils* attain men t with in 20 per cent bands, and 
an independent esrimate of attainment was obtained 
from the results of a written test taken by the same 
sample pupils. There was a different test for each age 
level but with similar con tent. The two tests were made 
up from the banks of APU written test items. The items 
selected were those relevant to the context of the 
problem tasks: measuring and sparial concepts relating 
to the Packaging task; reading tables and money calcu- 
lations to the Class Trip and Day Out topics; number 
patterns to Number Chains; and area and volume ques- 
tions to the Filling Boxes problem. 

For die survey administration, topics were ran- 
domised over school and over assessors widi the proviso 
that in every school the three groups took different 
topics. Thus there was no possibility that groups could 
glean any details of the problem they would be asked to 
solve from those pupils who had already been assessed. 

This design resulted in about 70groups taking each 
problem in the primary school and about 60 in die 
secondary. About 30 of the older groups took die fifth 
topic. Total 87. 

Some Inirial Results of the Age 1 1 Survey 

The analysis of the rcsul ts of the surveys of the two 
age groups are on-going; but, so far, only details of some 
the of age 1 1 results are available. These relate to the 



ratings and provide indications of relationships be- 
tween the scales and differences in responses to the four 
tasks. The importance of looking at the assessors' 
interpretation of these scaleswasstressed earlier. There 
are two ways in which this can be tackled: fector 
analysing the scales to ex'^mine their dimensionality, 
and relating the assessors' ratings to their details! 
observations. The latter have not yet been analysed 
extensively, but some investigations of dimensionality 
have taken place. Factor analyses for each topic pro- 
duced two main factors which were similar for all four 
topics. These could be described as cognitive and 
attitudinal factors. The cognitive factor had high load- 
ings on the scales Awareness of Problem, Working on 
the Task, and Resolution of the Problem. The attitu- 
dinal factor had high loadings on rhe Amount of Coop- 
eration and Attitude scales. 

The following results are examples from one of the 
scales with a high loading on the cognitive factor — 
Resolution of the Problem (Assessors' Evaluation), and 
from one with a high loading on the attitudinal factor 
~ Social Interaction. 

Table 1 

Ratings of Amount of cooperation within Groups 



Percent of groups rated as: 



Topic 


0 


1 


2 


3 


Number Chains 


2 


10 


42 


46 


Filling Trays 


7 


9 


43 


41 


Class Trip 


1 


8 


36 


54 


Packaging 


7 


15 


56 


21 



Four main types of group interaction had been identi- 
fied dunng the development. They could be placed on 
a scale of dominance of the group leader, from leader- 
less ro audiontanan. Groups taking Number Chains 
were most likely to be non-authontarian; again this is 
likely to be more a function of the task than of those 
groups who took the topic. Of the four topics it is the 
one where opinion, in contrast to logical argument, has 
least validity. The Packaging task had most scope for 
decisions to be made on the basis of opinion and 
therefore to be madeby those who wanted to dominate. 
Girls' groups were given a much higher proportion of 
the top ratings for cooperation in 3 of the 4 tasks, but 
boys' groups had more of the top ratings in Number 
Chains. More girls' groups were classified as leaderless 
or were chaired in two tasks, the other two being more 
equable between the sexes in this respect. 



Table 2 
Ratings of Type of Group 







Percent of groups rated as: 


Topic Lead 


erLss 


Chaired Dominant Authoritarian Other 






Group Leader 


Leader 


Number Chains 


55 


21 13 


2 9 


Filling Trays 


49 


23 12 


3 13 


Class Trip 


44 


23 18 


4 11 


Packaging 


34 


17 27 


6 16 



The assessor's evaluation was an overall summary 
rating of the extent to wh ich a group resolved or solved 
a problem. Table 3 contains the distribution of ratings 
which were given by the assessors (0 low, 3 high): 



The fi rst question addressed concerns the ex ten t to 
which the age 1 1 survey was successful in its aim of 
producing cooperative problem solving. The assessors' 
written comments and their discussion at a debriefing 
meeting held after the survey indicated that this was 
the case. This was reflected in their ratings of the 
amounts of cooperation for the four topics on a four 
point scale ranging from 0 (lov/) to 3 (high). These 
results are summarised in Table 1 . 

The two "everyday" topics. Class Trip and Package 
ing, received respectively the highest and lowest num- 
ber of ratings of 3 for cooperation. This was almost 
certainly due to different task requirements: there was 
pressure in Class Trip to make decisions together, while 
some groups made individual designs for Packaging, al- 
though most reached a common final decision. It was 
encouraging to note the high cooperation ratings for 
the most p^ < 'Ay mathematical topic. Number Chains. 

Further information is provided by the assessors' 
Q categorisation of type of interaction within the group. 
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Table 3 
Assessor's Evaluation 



Percent of groups rated as: 
0 12 3 

Topic 



Number Chains 


12 


30 


37 


21 




Filling Trays 


7 


42 


31 


20 




Class Trip 


1 


14 


61 


24 




Packaging 


3 


29 


63 


6 





The "everyday" tasks appear to have been easier 
than the more obviously mathematical problems, al- 
though assessors were reluctant to give a top rating to 
the design task. Packaging. 

Pupils selected for the group survey were due to 
take the special written test described earlier. Their 
score on tliis test gave an indication of the extent to 
which the pupils in a group had been of similar attain- 



mcnt as requested. The test scores also provided a 
comparison of the small group sample with the main 
sample, some of whom had taken the same questions 
that appeared in the special test. 

The resul t showed that the mean success rate of the 
questions in the small group sample test was signifi- 
cantly higher than that of the same questions taken by 
the main sample (5 1 .3 % to 48.0% ). This finding is not 
all that surprising since two of the pupils in each group 
had been selected by the school and not randomly. So 
far as ability mix was concerned about two thirds of the 
groups had test scores within a range of 15 percentage 
points. There was an occasional extreme mix ( e.g. 8%, 
10%, 49% success). 



only: cognitive and attitudinal. 

Findings for the more detailed data which were 
obtained will be reported later. 
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Test sc re was not expected to correlate highly 
with amount of cooperation so it is interesting to note 
the relatively higher correlation for Number Chains, 
while cooperation was not associated with attainment 
forPackagingandFillingTrays. It should be noted that 
the assessors had no knowledge of the pupils' test scores 
(neidier did their schools) nor were they informed of 
th'^ school's estimates of pupil ability. 
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Different educators mean different diings when 
they use the word evaluation, and the evaluation litera- 
ture provides multiple perceptions of evaluation. The 
well-knowndefinitionsuggested by Ralph Tyleralmost 
forty years ago, and still used by many, perceived 
evaluation as '*the process of determining to what 
extent educational objectives are actually being 
realized."(Tyler, 1950, p. 69) This definition matched 
the geueral tendency in education to associate evalu- 
ation widi testingand limited its scope to the measure- 
ment of students' achievements. Such an approach was 
also in congruence with the common sense of politi- 
cians and the general public who, on various occasions, 
requested that educators be accoun table for their deeds 
and provide evidence of their effectiveness in the form 
of data on improvement in students' performance. 
Many evaluationsof educational programsstillfbcuson 
changes in students' achievement as a major variable 
for the assessment of the program. Even when some of 
them collect data related to the process of implement- 
ing the program being evaluated, it is used mainly as a 
means for interpreting the findings about students' 
performance, rather than as a criterion for assessing the 
quality of the program. 

But evaluators experienced many problems in 
measuring the "really important" impacts of programs 
(e.g. long-range impacts). They alsofind it quite diffi- 
cult toestablishacausal relationshipbetween students' 
participation in a new program and their achievement 
by means of implementing a true or even quasi-experi- 
mental design, as has been suggested by Campbell and 
Stanley (1966) and other research methodologists. 
Evaluators have also realized that the richness of a 
program or a project cannot be expressed only by its 
^npact on students' behaviors, nor can the full range of 
their clients' information needs be served by data only 
on students* test scores. 

The evaluation literature has been suggesting for 
some time many attempts to extend the scope of infor- 
mation that should be collected regarding each pro- 
gram that is being evaluated. Stake (1967) in his 
Countenance Evaluation Model suggested that two 
sets of information be collected regarding the program 
being evaluated: descriptive and judgmental. The 
descriptive set should focusonintentsand observations 
regarding antecedents (prior conditions that may affect 
the outcomes of the program), transactions (the proc- 
ess of implementing the program), and outcomes of the 
program such as students' achievements but also other 
outcomes. The judgmental set c information in Stake's 
model is comprised of standards and judgments by 



relevant audiences regarding the same antecedents, 
transactions, and outcomes. 

Cuba and Lincoln (1981) extended Stake's ap- 
proach and applied it to the naturalistic paradigm. 
They suggested that the evaluator collects five kinds of 
information as follows: 1) descriptive information re- 
garding the program, its settings and its surrounding 
conditions; 2) information responsive to concerns of 
relevant audiences of the evaluation; 3) information 
about relevant issues; 4) information about values; and, 
5) information about standards relevant to the worth 
and merit of the assessments. 

Stufflebeam, together widi a prominent group of 
evaluators (Stufflebeam, et al., 1971) analyzed various 
types of decisions and decision- making settings. They 
endorsed Stufflebeam's CIPP Evaluation Model, sug- 
gesting thatevaluation focus on foursets of information 
regarding the program being evaluated: the goals of the 
program, its design or strategy, its process of implemen- 
tation, and its outcomes. 

The notion that a wide range of information should 
be collected regarding each educational program has 
beensupportedbymanyotherauthorsintheevaluation 
literature published in recent years (e.g. Meckenzie, 
1983; Nevo, 1983; Dorr-Bremme, 1985; Colley and 
Bickel, 1986; Glasman and Nevo, 1988). Thiswasalso 
the perspective of our e val ua tion study of an elementary 
school computerassisted instruction (CAI) mathemat- 
ics program (the TOAM program). Nevertheless, in 
planning the evaluation study, we had to work hard to 
convinceourclientsthat"harddata"c* studentachieve- 
ment is not the only thing that could be useful to them 
in making decisions about the program. And since 
similar difficul ties have also been experienced in other 
evaluations, we would like to rcemphasize the impoi- 
tance of widening die perspective of program evalu- 
ation, to point out some possible methodological solu- 
tions, and to discuss the utility of such an approach on 
the basis of our experience. 

The Program and its Evaluation Design 

The TOAM program is an Israeli adaptation of a 
CAI mathematics program initially develop)ed at Stan- 
ford University in the early sixties by Suppes and 
associates (Suppes, et al., 1968). The program was 
adapted to the local mathematics curriculuiA and has 
been used in Grades 2 to 6. Participating students used 
the computer twice a week, each time for 20 minutes, 
where they had an opportunity to practice individually 
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on a graded sequence of exercises and were provided 
widi feedback regarding their performance. The teacher 
is also provided with a diagnostic summary of the whole 
class at the end of each period. The computer in diis 
program is used only during 40 minutes a week out of a 
total ofaboutfourhours of weeklymathematicsinstruc- 
tion. The computer is used only for practice and 
diagnosis while most of the instruction is done within 
the regular class by means of other teaching methods. 

The evaluation was conducted within the frame- 
work of the city of Tel Aviv, where the local depart- 
ment of education decided to introduce TOAM 
program into schools with a high proportion of cultur- 
ally disadvantaged students. The purpose of using the 
program was to help low-achieving students without 
hindering the progress of advanced students. TOAM 
computers had been used for some years in the schools 
of Tel Aviv when the local department of education 
decided to fund a one-year evaluation to examine the 
usefulness of the program and how it could be im- 
proved. 

In light of our perception regarding the scope of 
evaluation (Nevo, 1983), and on the basis of interac- 
tion wi>h our clients and other stakeholders in the 
program, three major questions were identified as re- 
flecting what might be their main information needs. 
The following evaluation questions were agreed upon 
to be addressed by the evaluation: 

a. Are the rationale and the structure of the 
TOAM program based on acceptable educa- 
tional approaches providing a reaonable 
chance to affect its target population? 

b. Is the program being implemented as 
planned and in an efficient way? 

c. Does the TOAM program have an impact 
on students' achievement in mathematics 
and on their attitudes towards studying 
mathematics? 

Four sources of information wc. ^ used to address 
the first question. They were: major documents of the 
program; interviews with the program personnel; a re- 
view of the literature on mathematics education and on 
computer assisted instruction, including some meta- 
analysis studies; and experts' opinions on the program, 
obtained from four experts especially for this evalu- 
ation. 

The second evaluation question was addressed by 
meansof: administrative reports of the program; struc- 
tured observations in mathematics classess and com- 
puter practice sessions (46 observation hours in 9 
schools); interviews with teachers and computer per- 
sonnel; and questionnaires administered to students (n 
= 241),teachers (n = 191) and principals (n = 16). 



For the third evaluation question data were col- 
lected on students' achievement and their attitudes 
toward mathematics. Data on TOAM computer scores 
were analyzed for a total of 5254 students in Grades 2 to 
6 in 19 schooln. Standardized paper-and-pencil tests 
were administered to 273 TOAM students in Grades 4 
to6andto214studentsincomparisongroups. Attitude 
questionnaires were administered to 123 TOAM 6th 
graders and to 118 students in similar comparison 
classes. Students in comparison groups were selected 
on the basis of similar socio-economic background to 
that of the TOAM group but random assignment of 
students to groups was not feasible in this study. 

Major Findings of the Evaluation 

A detailed presentation of the data analysis proce- 
dure and findings of this evaluation can be found 
elsewhere (Nevo, 1984; Mecer, 1986; Nevo, in press). 
In this paper only a summary of the major findings will 
be presented as a basis for our discussion on the scope of 
evaluation. Following are major findings regarding 
each evaluation question: 

Are the rationale and the structure of 
the TOAM program based on accept- 
abk educational approaches providir^ 
a reasonable chance to affect its target 
population? 

a. TOAM is based on a behavioristic appoach 
emphasizing the relationships among 
stimulus, response, and reinforcement. This 
approach was highly criticized in the litera- 
ture and by the experts used in this evalu- 
ation as an approach of limited value appro- 
priate mainly for learning simple tasks. 

b. An extensive review of the literature on 
CAI and mathematics education showed 
that the use of computers in instruction can 
be useful when used in conjuction with 
regular class instruction, and with close 
cooperation between the teacher a.id die 
computer. 



c. 
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Previous studies, conducted by the organiza- 
tion which developed and operated the 
TOAM program around the country, which 
showed the effectiveness of TOAM in im- 
proving students' achievements in mathe- 
matics were all based on TOAM computer 
tests rather dian on standardized paper and 
pencil tests. 



Is the program being implemented as 
planned and in an efficient way? 

a. Review of administrative reports and direct 
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observations in schools showed that the 
operation of TO AM within the schools was 
well organized and implemented according 
to formal instructions and with almost no 
complaints from participating schools. 

The bi-weekly computer practice sessions 
were implemented by special TOAM in- 
structors; the participation of class teachers 
in those sessions was very limited. One third 
of the teachers indictated in their question- 
naires that they do not attend regularly the 
computer practice sessions with their 
students. Our sample observations showed 
that in more than half of the the sessions 
teachers were absent. There were no 
regulations regarding teacher presence in 
computer practice sessions of their students. 



a. Analysis of TOAM computer test scores in 
participating schools showed that the 
percent of students reaching the expected 
minimal requirement level for their grade at 
the end of the year was significandy higher 
than an estimated level of non-participating 
students. However, a high percentage (33 
to 85) of participating students in various 
classes did not reach the minimum require- 
ments determined by the TOAM program 
by the end of the year . 

b. The analysis of the TOAM computer test 
scores also showed that the progress of high 
level students v'as significandy greater than 
the progress of low level students. Thus, the 
gap between low achievers and high achiev- 
ers seemed to increase by virtue of the 
TOAM program. 



More than 80 percent of the teachers indi- 
cated in their questionnaires that they used 
the computer reports, provided at the end of 
each practice session, in planning their 
lessons. However in our structured observa- 
tions in 32 fourth and sixth grade lessons we 
succeeded in tracing some kind of reference 
to computer reports in only one third of the 
classes. 

The teaching style of teachers in classes 
participating in the TOAM program was 
found (in classroom observations and 
teacher questionnaires) to be similar to the 
typical teaching style of teachers in regular 
classes in Israeli schools and included very 
litde work in small groups and individual 
work of students. However, the tendency to 
use "non-conventional" teaching methods 
was slightly stronger among teachers who 
had participated in the program for more 
than one year. 

Teachers seemed to be pleased with the 
orientation training diat they got when they 
joined the TOAM Program, but many of 
them (30 to 50 percent) asked for additional 
guidance in teaching gifted students, 
working in small groups and dealing with 
individual di^erencc^ in heterogeneous 
classes. More than 50 \. "^rcent of the 
teachers did not get any in-service training 
during their first year in the program except 
a one day orientation when they joined the 
program. 

Does the TOAM program have an 
impact on students* achievement and 
on their attitudes toivards studying 
mathematics? 



c. Standardized paper-and-pencil tests admini- 
stered to 4th and 6th grade students partici- 
pating in the program and to non-partici- 
pating students with similar backgrounds 
showed no statistically significant difference 
between the overall mean scores of both 
groups. However, in two out of the six sub- 
scores of the fourth grade test, a significant 
difference in favor of the TOAM group was 
found. No significant differences in sub- 
scores were found in the sixth grade, but a 
significant difference was found among the 
groups in the percentage of students who 
got high scores on the entire tests (more 
than 75 percent corrcc*. answers). 

d. Regarding students' attitudes towards 
mathematics, "math anxiety" was found to 
be significandy lower in the TOAM group 
in the sixth grade compared to the compari- 
son group, but no significant differences 
among the groups were found regarding 
other sub-scales of the attitude question- 
naire. 

e. Teachers and principals expressed overall 
positive attitudes towards the program and 
thought that TOAM had a positive impact 
on students' achievement, especially on 
good students. 

Summary and Discussion 

In spite of the fact that during the planning phase 
of the evaluation our cl ients showed a strong preference 
for informacion on students' achievement, that would 
demonstrate the impact of the TOAM program, such 
information turned out not to be useful when the 
evaluation study was concluded. Since, as we men- 
tioned earl ier, the use of an experi mental design within 
the framework of this study was not feasible, there were 



some limitations on the inference that could be made 
from our data on students' achievement in theTOAM 
groups and the comparison groups. However, it seemed 
clear that there is no strong evidence to support the 
claim thatTOAM has a significant impact on students' 
achievement, and that such a claim is unwarranted at 
least considering the way the program has been actually 
implemented. The contradictory findings for the 
computer tests and the paper-and-pencil tests were 
interesting. So were the findings that showed that 
TOAM, which was funded within the framework of 
special support to disadvantaged students, seemed to be 
increasing the gap between low achievers and high 
achievers. 

But the important question was, what could be 
done with those findings regarding the impact of 
TOAM? Soon it became clear that the answer was: 
'Heally not much!" Nobody would make a decision to 
discontinue the TOAM program in the Tel Aviv 
schools, since there was no availablealtemative on the 
market that could offer a complete set of courseware in 
mathematics for elementary school classes. It was also 
apparent that no one would shift funds from a CAI 
program to other educational projects at a time when 
the whole educational system seemed to be '^hooked" 
on computers and perceived the introduction of com- 
puters into the school as a major effort to modernize the 
educational system. Actually, if one would be willing 
to decide to discontinue funding of the TOAM pro- 
gram he could do so on the ground of a simplistic 
rationale and poor implementation as was found in our 
evaluation. 

When we submitted our final evaluation report it 
was apparent that although the original charge of the 
evaluation was formative as well as summative, its 
major contribution could be only in its formative mode. 
TOAM was there to stay, and the only decisions that 
could be made about it would be related to its improve- 
ment. But not much advice could be derived from the 
test results, at least not as much as could be derived from 
the other findings. 

Our findings regarding the rationale of the pro- 
gram and its structure (first evaluation question) sug- 
gested clearly that TOAM was based on a simplistic 
approach that has been highly criticized by experts on 
CAI and mathematics education as well as by the 
research literature. Our study also showed (second 
evaluation question) that teachers were not getting 
sufficient training and guidance to incorporate the 
work of their students with the computer into the 
whole process of teaching and learning mathematics. 
On the basis of these findings it was quite simple to 
develop recommendations regarding the improvement 
of the rationale of the program, the structure of its 
courseware and its use in the school. Among other 
things we recommended that the organization develop^ 



ingand administering the TOAM program seek advice 
from the current literature and additional specialists in 
CAI and mathematics education to update its course- 
ware and renew its conceptions. We also recom- 
mended that an extensive manual for TOAM teachers 
be developed and that an effective teacher training and 
guidance program be developed and implemented. 

Obviously, we must continue to seek evidence on 
the impsct of educational programs as part of our 
evaluation practice. But, it is also very important to 
include in our evaluations activities directed toward 
theassessmentoftheprogrramrationaleanditsstrategy 
and process of operation. If we decide to follow this 
advice, we will find that ttiere are sufficient tools to do 
so; some of them old, and some of them quite new. In 
this regard we should remind ourselves of observational 
techniques (e.g. Simon and Boyer, 1976), content 
analysis methods, use of experts' opinions (e.g. Nevo, 
1985), and the use of recently developed methods of 
meta-analysis (e.g. Hedges and Olkin, 1985) for quan- 
titative synthesis of research literature. 
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ASSESSMENT OF OPEN-ENDED WORK IN THE SECONDARY SCHOOL 



Dylan Williams 



"I don*t know what you what you mean by ' glory \" 
Alice said. Humpty Dumpty smiled contemptuously. 

"0/ course you don*t - tiU I tell you. I meant 'there's a 
nice knock'down argument for you' ! " 

"But 'glory* doesn*t mean *a nice knock-down orgU' 
mer\t\" Alice objected. 

"When I use a word" Humpty Dumpty said, in a 
rather scomfid tone, "it means just what I choose it to 
mean - neither more nor Jess" . 

"T/ie ({uesixon is," soxd Alice ^ "whether you can rmke 
words mean so many different thirds. " 

"The question is," said Humpty Dumpty ^ "which is to 
be master - that^s all.*' (Carroll, 1871) 

What kind of activity! 

There does not appear to be any broad consensus 
about the meaning of the terms "open-ended activity", 
"problem", and "investigation" when applied to school 
mathematics. For the purpose of this paper, therefore, 
I shall use "investigating*' to describe the entire spec- 
trum of mathematical activity. This ranges from be- 
coming aware of a domain to be explored, through 
defining or posing a problem, solving the problem as 
defined, to extending or reformulating the problem, 
and then, possibly, going around the cycle again. 
"Problem solving" is then a distinct phase in "investi- 
gating", as, for example, is "problem posing". What 
precisely constitutes "mathematical activity" is, of 
course, principally a question about the nature of mathe- 
matical knowledge; in otlier words, of epistemology. 

Many distinctions in the nature of knowledge have 
been proposed. Some of these are intended to apply 
principally to the domain of mathematics, while others 
are much more general See the list below for some 
examples. 



Most of these distinctions appear to have some 
commonality; they seem to be addressing different 
aspects of the same kind of idea. Rather than invent a 
new pair of words I shall use the term "conceptual 
knowledge" as a generic term for the kind of knowledge 
typified by the entries of tlie first column of the above 
listand "procedural knowledge" for those in thesecond 
column. 

The emphasis in much recent research, especially 
that done in North America, appears to have been on 
how procedural knowledge becomes transformed so 
that it also exists as conceptual knowledge. This can be 
interpreted as reflecting the concern of research to 
make the "traditional" teaching of mathematics more 
effective (By "traditional'* I mean teaching where 
mathematicalknowledgeis"laidout"beforetheleamer, 
and the learner "makes sense" of it). Furthermore, the 
main focus of this research has been "bottom up" in 
that it has concentrated on relatively simple (but still 
very complex!) domains such as young children's 
understanding of arithmetic. In contrast to this ap- 
proach, it is possible to focus primarily on conceptual 
knowledge, and concentrate on how knowledge that 
exists initially as conceptual knowledge can become 
"routinised" or "made automatic" so that it also exists 
as procedural knowledge. Learning that takes place in 
this way highlights the distinction between procedural 
knowledge that is "backed up" by conceptual knowl' 
edge, and procedural knowledge that is mainly isolated 
in that domain. However, for a given activity that we 
might use for assessment, we cannot be sure that we are 
assessing conceptual rather than procedural knowl- 
edge. For example, solving a quadratic equation is, 
essentially, a test of procedural knowledge if you know 
the formula. If, on the other hand, you don't know that 
there exists such a formula, then the task is much more 
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Katona (1942) 

Maier(l945) 
Wertheimer(1959) 
Scheffler(1965) 
Tulving (1972) 
Greeno (1973) 
Skemp(1976) 
Piaget(1978) 
Anderson (198^^ 
Hiebert&Lefevrc(1986) 



"conceptual" 
meaningful apprehension 
of relations 
productive thinking 
structural understanding 
knowing that 
semantic memory 
propositional knowledge 
relational understanding 
conceptual understanding 
declarative knowledge 
conceptual knowledge 



"procedural" 
senseless drill and 
arbitrary associations 
re-productive thinking 
rote memory 
knowing how to 
episodic memory 
algorithmic knowledge 
instrumental understanding 
siiccessful action 
procedural knowledge 
procedural knowledge 
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likely to test your conceptual knowledge. Such ex- 

amplesare not confined to the traditional school mathe- 

matics cumculum. For example if we have a cube, and 

each fece is to be painted either black or white, how 

many distinct arrangements are there if rotations and 

reflections are not to be counted as different? Tliis is 

certainly a non-routine task for most, but if you know 

the Polya-Bumsideformula, it is justa matter offollow' 
ing the steps. 

Here I would like to introduce the idea that certain 
mathematical tasks might act as "amplifiers"of the dif- 
ferences between different students* previous expcri- 
ence. If the items quoted above were given to l6'year 
old students, it seems likely that the quadratic equation 
item would tend to increase the encci5 due to differ- 
ences in students' past experience, while the cube-col- 
ouring task would tend to reduce those effects. Whether 
there exist tasks tha t "reduce experience" sufiicien dy to 
be useful in this respect and wherfic. such tasks can 
be selected are issues for debate, but I feel that 
on this kind of activity vill provide us with 
»^ '.ts into the general proc.;sses of mathematics. 
Mathematics has often been viewed in the past as "what 
is left when all the context has been removed." In this 
sense, i am proposing that by ren.^ \nng useless as mucn 
of a stuaent*s procedural knowledge as we can, we can 
learn much abo-tt the essence of mathematical think- 
ing. 

Here, I want to make clear that I am not advocating 
that the procedural knowledge that a student has is not 
important. >X^at I am arguing is that by using tasks that 
reduce its effect on student performance (and perhaps 
only then), we can begin to look at more general 
mathematical processes. However, these mathematical 
processes are useless without ma thematical objects upon 
which to operate. Ultimately, therefore, I see the 

assessmentoftheseprocessesascomplementarytomore 
traditional forms of assessment, raAer than replacing 
them. To heighten the contrast with the existing 
paradigm further, this approach can be applied, not to 
relatively simple domains like arithmetic, but to rela- 
tively complex domains like students' attempts at solv- 
ing complex mathematical problmes. This immedi- 
ately raises two questions: how can we engender this 
kind of activity and how do we assess them? 

What kinds of task? 

^ The relationship between task and activity isclearly 
farfrom straightforward (See,forexample, Christiansen 
and Walther, 1986). Bauersfeld (1979) has pointed to 
the differences that often exist between the matter 
intended, the matter taught, and the matter learned. 
Burron (1980), on the other ha. id characterises the 
important distinction as being between puzzles and 
problems. What is repeatedly r^tissed is the importance 
of d^e student making the task her own. Any attempt to 
understand when and how this happens cannot be 



based on an analysis of the task alone, or on just the 
cognitiveand meta-cognitivecharacteristicsof rhestu- 
dent. This realisation is manifestin the notions of belief 
systemsas used by Schoenfeld(1985),situational analy- 
sis (Balacheff, 1985; Depuis 1985), and perhaps most 
significandy in activity theory (Christiansen and 
Walther, 1986; Mellin-Olsen, 1987). It is in connec- 
tion with these non-cognitive factors that the idea of 
an open-ended activity becomes important. Schoen- 
f eld ( 1 985 ) has reported (as ha ve ma ny oth ers) that stu- 
dents' attempts at tasks are often distorted by their 
beliefs. If they think that the teacher has a particular 
answer in mind, the students will often not be thinking 
mathematically, but will, instead, be trying to "guess 
what's in teacher's head.'* I will therefore use the term 
"open-ended activity" for a task which presents a more 
or less clearly defined starting point for a student, but 
where the exact nature of the goal, and consequently 

when theactivity terminates, isunderthecontrolof the 
student. 

To summarise, the stance that I am ado^'-mg here 
is that there do exist tasks that generate activity in 
students that reduce the effects of procedural knowl- 
edge sufficiently to allow us to assume that diat die 
degree of success on those tasks is primarily due to 
conceptual knowledge; diey are valid in diat die 
activity that they generate is, in essence, mathematical; 
they can be presented to students in such a way as to 
cause die students to "engage" and "make diem dieir 
own." Here are some candidates: 

How many integral-sided triangles can be 

made with longest side "n"? 

Hdw many integral-sided triangles can be 

made with perimeter "n"? 

How many ways are there of giving someone 

"n" cents? 

What kind of assessment? 

In the United Kingdom over the past few years, 
diese "open-ended" tasks have been increasingly used 
in the teaching of mathematics. A broad consensus 
does seem to be emerging that any mathematics cur- 
riculum which neglects diese aspects of die learning of 
mathematics is deficient in important respects (HMI, 
1985). However, while there is much evidence of this 
kind of activity in classrooms, very little research has 
been done on how these kinds of thinking might be 
L^sessed or evaluated. The major approaches to assess* 
ing mathematical activity can be classified by the prin- 
cipal variable used to evaluate the quality of the think- 
ing involved. In the "cognitive demand" approaches, 
die central feature is (adopting a metaphor from com- 
peti tivfc diving) the "degree of difficulty" of die task; or, 
where there is a series of tasks, the hardest task suc- 
cesfijlly attempted. If we persist with the diving meta- 
phor, the other approaches can be thought of as assign- 
ing central status to the "marks for style." The most 
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important feature is the extent of progress made on a 
single task. 

"Cognitive Demand" Approaches 

In the literature from cognitive and developmen- 
tal psychology, the work of Piagct (1956), Pascual- 
Leone (1970) and Case (1985), offer us a number of 
possible cognitive structures that might be used as the 
basis of an assessment scheme. The major drawbacks of 
these schemes are two-fold. In the first place, they tend 
to have a relatively small number of levels; and, 
secondly, they tend to be rather difficult toapply to the 
complex mathematical tasks under consideration here. 
This appears to be principally because the major re- 
search instrument used for assessing the level of devel- 
opment tends to be a graduated series of simple tasks 
rather than a single cx)mplex task. 

In geometry, the model proposed by Van Hiele 
(1986) can be used to assign students* geometric think- 
ing to one of five different levels. Here theemphasis has 
moved slighdy away from cogntive structures and a 
little towards levels as existing in the organisation of tfie 
thinking of the individual. This idea is more completely 
realised in the SOLO taxonomy developed by Biggs and 
Collis (1982) which completely eschews the idea of a 
'"hypoAetical cognitive structure". SOLO is an acro- 
nym for the Structi^-** Of LcamingOutcomes; and, as its 
name suggests, it concentrates on assessing the quality 
of the learning outcome, without speculating about 
how it was achieved. 

These last two models offer significantly more 
scope for the assessment of complex mathematical 
tc sks, because they deal with complex tasks; however 
they share two drawbacks. The first is that the degree 
of resolution of the assessment instrument tends to be 
small. The Van Hiele scheme gives about three levels 
for the mathematical attainment of the age-l6-cohort, 
and the SOI O taxonomy gives about five. This is, of 
course, a recurring theme; the more levels you get, the 
less reliable is the allocation of a given piece of work to 
a particular level. The second drawback is that these 
schemes do not appear to transfer in any simple way to 
the kinds of activity being discussed here. It seems, 
therefore that both the model offered by Case, and the 
SOLO taxonomy offer considerable scope for the fu- 
t ire, but appear too difficul t to translate into assessment 
practice at the moment. 

^'Extent of Progress" Approaches 

Drawing on the work of Polya (1945), Schoenfeld 
in the United States, and Mason and Burton in the 
United Kingdom, have developed heuristic models of 
the problem-solving process (See, for example, Sch- 
oenfeld, 1985; Mason, Burton & Stacey, 1982; Mason, 
1984; Burton, 1984). These heuristic-based schemes 
appear, in turn, to have informed the various schemes 



that have been developed in the UK for the laige- scale 
assessment of mathematical problem-solving, invesri' 
gation, and exploration. Examples of these are the 
Dcpartmen t of Education and Science's Working Party 
on Mathematics Draft Grade Criteria (SEC, 1985 ), the 
assessment model proposed by theOxfordC^ertificateof 
Eduarional Achievement (OCEA, 1987a; 1987b), and 
all the assessment schemes proposed by the examina- 
tion boards for the examination of coursework in GCSE. 
Other work in this same tradirion of assessing mathe- 
matical process has centered on the work of Bell. In a 
series of studies (Bell 1976; 1979; Horton, 1979; GaU 
braith, 198 1 ), Bell and others have examined students* 
proof-cxplanarionsand have elicited structures tf^atare 
quite general. 

All these process-based schemes have tended to 
regard the cogniri ve demand of die task as of secondary 
importance, and, ineffect,therefore, treated all tasks as 
esserrially equivalent. Consequently, riiese process- 
based schemes would not disringuish between the same 
process displayed in different problem-contexts, even 
chough the difficulty (as determined by, say, facility) 
might be very different. Clearly then, what is required 
is a scheme that can combine the "cognirive demand" 
approach with die "degree of difficulty** approach. 
Such a scheme is probably a long way off, but what 
follows is an outline of a way in which account can be 
taken of die degree of difficulty of the task, so that the 
process-based schemes referred to above can be used 
with greater precision. As oudined above, the stance 
adopted in this paper is explicidy constructivist in the 
sense oudined by, for example, Davis ( 1984) andNovak 
(l986). All the students* acrions are assumed to be 
"intelligent** within the frame of reference of the stu- 
dent. In assessing the acrivityu re seeking to locate 
that frame of reference, and as far as is possible, assess it 
on its own terms. No account is taken of the relarion- 
ship between the task intended by the teacher, and die 
activity in which die student engages. All that is im- 
portant is how difficult the "matncmatical terrain** was 
to chart, and the quality of charting done. 



Task 
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The tasks that have come to be most frequenriy 
associated with open-ended acrivity in the UK can be 
characterised as Data-F item-Generalisarion (DPG) 
tasks (Wells, 1986, pi 1). Having defined a problem, 
the student typically generates some data, organises the 
data, looks for patterns, makes hypotheses, tests them, 
and, if possible, proves them. The three main phases of 
acv.vity are therefore systematic generarion of data, 
derivingrelationships,and makingproofs. In diis paper 
I will deal only with the first two of these. For detailed 
accountsofstudents*proof-explanations see Bell ( 1976, 
1979, 1980) and Galbraith (1^81). 
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Systematic generation of data 

In the task called "Sending cards" (GAIM, 1988), 
students are asked to investigate how the number of 
cards sen t varies wi th the size of the group if everyone in 
the group sends a card to everyone else. Most students 
generate the data systematically here by incrementing 
the independent variable (the number of people in the 
group) by one each time, giving rise to the sequence 2, 
6, 1 2, 20, 30,... (i.e. twice the triangle numbers). In die 
task "How many rectangles?" (SMILE, 1975) students 
are asked to investigate how many rectangles are cre- 
ated if a number of horizontal and a number of vertical 
lines are drawn across a rectangle. This situation is 
clearly more complex than that in "Sending cards", in 
that there are two independent variables: the number 
of horizontal lines, and the number of vertical lines. 
Most students who manage to generate the data system- 
atically do so by holding one of the independent vari- 
ables fixed, and incrementing the other. 

Unfortunately, characterising the complexity of a 
task by the number of independent variables breaks 
down when we consider a task like "Four squares" 
(GAIM, in press). Here students are required to gener- 
ate al 1 possible colourings of a four- region map with four 
colours, each colour being used exactly once. However, 
we can generalise the notion of the number of inde- 
pendent variables by i n troducing the notion of a search- 
space. The search space of a task consists of all possible 
combinations of the values of the independent vari- 
ables. The difficulty of carrying out a search is then 
characterised bv the efficiency of various search strate- 
gies in exhaustiiig the space. 

At this point it is probably worth noting that this 
idea of "search space" is different from the idea of a 
"state-space" in theproblem solvingliterature. Searches 
of "state-spaces"are designed to re-^ch one particular 
state (the goal state). In this case, the object is to locate 
every element of the search space. The strategy used 
above for "Sendingcards"can be termed a linear search 
strategy, or a 1-dimensional Cartesian search strategy. 
In the same way, the strategy used for "How many 
rectangles" would be termed a 2-dimensional Cartesian 
search strategy. Using a 4-d Cartesian search strategy 
on "Four squares" will yield all the elements of the 
search space, but only at the expense of a considerable 
number of "disallowed" combinations. In fact it will 
yield 256 combinations of which only 24 are allowable 
a rejection rate of over 90 percent! However, we (and 
most students who attempt this activity) can do better 
than this by using a "tree-like" search strategy This 
strategy generates only allowable combinations, and 
generates all the possible combinations without repeat- 
ing any of them. 1 1 is, in fact, the most common strategy 
employed by students who are successful in finding all 
the combinations. 
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To sum up then, "Four squares" is exhausted by a 
4-d Cartesian search strategy, but it is not efficient, 
while the tree-like search strategy is both efficient and 
exhausting. These strategies can also be thouglit of as 
similar to die production systems used in, for example, 
Anderson's ACT theory (Anderson, 1983). Wecango 
on to consider tasks for which effici^ cstrategiesdo not 
exist. A good example is the task of finding all the 
pentominoes, in other words finding all the ways of ar- 
ranging five squares if all the squares must join edge to 
edge and comer to comer. Most recalcitrant of all are 
those search spaces for which there is neither an effi- 
cient nor an exhausting strategy. 

Deriving relationships 

Having derived the data, the next stage is to look 
for pattems within thatdata;and,wherepossible, to hy- 
pothesise relationships. Clearly if the value of the 
dependent variable is always one more than that of the 
independent variable (e.g. the relationship between 
the number of fences and fence-posts) that relation- 
ships is going to be easier for students to discover than 
in , for example "Sending cards". Another aspect of the 
complexity of the mathematical relationship between 
variables is the way that students choose to express the 
pattems that they discover. For example, in "Sending 
Card'", students seem to find it easier to describe the 
sequence as "going up in ev'^n numbers", than as "the 
number of people rimes by the number of people minus 
one". The first is an example of a term-to-term rule, 
while the second isa posirion-to-term rule. In general, 
the term-to- term rule is "easier" and so more accessible 
to students. This distinction has actually more to do 
with how students represent their discovery than with 
the structure of the problem, and properly belongs in 
the heuristic- or process-based side. However, I have 
mentioned it here, because there are situations where 
there is no position-to-term rule, but there is a term-to- 
term rule (See, for example, the Joscphus problem in 
Engel, 1985, pl85). The following list is offered as a 
tentative hierarchy. It is not particularly "robust" since 
very large numbers, for, say, an additive mapping might 
be harder than small numbers with a linear relation- 
ship: additive, multiplicative, linear, quadratic, poly- 
nomial, exponential, other (e.g. involving hcf orgcd). 

Summary 

This paper has presented a model for evaluating the 
"degree of difficulty" of a class of mathematical activi- 
ties which can be used to complement heuristic- or 
process-based assessment schemes in order to give a 
more accurate indication of die "power" of the mathe- 
matical thinking represented by a piece of work. The 
model characterises this "degree of difficulty" by two 
factors: the structureof the search space of the problem, 
and the complexity of die mathematical relationship 
between die variables. 
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